pyarrow.NullArray¶

class pyarrow.NullArray¶

Bases: pyarrow.lib.Array

Concrete class for Arrow arrays of null data type.

__init__()¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`buffers`(self)	Return a list of Buffer objects pointing to this array’s physical storage.
`cast`(self, target_type, bool safe=True)	Cast array values to another data type.
`dictionary_encode`(self)	Compute dictionary-encoded representation of array
`equals`(self, Array other)
`format`(self, int indent=0, int window=10)
`from_buffers`(DataType type, length, buffers)	Construct an Array from a sequence of buffers.
`from_pandas`(obj[, mask, type])	Convert pandas.Series to an Arrow Array, using pandas’s semantics about what values indicate nulls.
`isnull`(self)
`slice`(self[, offset, length])	Compute zero-copy slice of this array
`to_numpy`(self)	Experimental: return a NumPy view of this array.
`to_pandas`(self[, categories])	Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
`to_pylist`(self)	Convert to a list of native Python objects.
`unique`(self)	Compute distinct elements in array
`validate`(self)	Perform any validation checks implemented by arrow::ValidateArray.

Attributes

`null_count`
`offset`	A relative position into another array’s data, to enable zero-copy slicing.
`type`

buffers(self)¶

Return a list of Buffer objects pointing to this array’s physical storage.

To correctly interpret these buffers, you need to also apply the offset multiplied with the size of the stored data type.

cast(self, target_type, bool safe=True)¶

Cast array values to another data type.

Example

>>> from datetime import datetime
>>> import pyarrow as pa
>>> arr = pa.array([datetime(2010, 1, 1), datetime(2015, 1, 1)])
>>> arr.type
TimestampType(timestamp[us])

You can use pyarrow.DataType objects to specify the target type:

>>> arr.cast(pa.timestamp('ms'))
<pyarrow.lib.TimestampArray object at 0x10420eb88>
[
  1262304000000,
  1420070400000
]
>>> arr.cast(pa.timestamp('ms')).type
TimestampType(timestamp[ms])

Alternatively, it is also supported to use the string aliases for these types:

>>> arr.cast('timestamp[ms]')
<pyarrow.lib.TimestampArray object at 0x10420eb88>
[
  1262304000000,
  1420070400000
]
>>> arr.cast('timestamp[ms]').type
TimestampType(timestamp[ms])

Parameters:	target_type (DataType) – Type to cast to safe (boolean, default True) – Check for overflows or other unsafe conversions
Returns:	casted (Array)

dictionary_encode(self)¶: Compute dictionary-encoded representation of array

equals(self, Array other)¶

format(self, int indent=0, int window=10)¶

static from_buffers(DataType type, length, buffers, null_count=-1, offset=0)¶

Construct an Array from a sequence of buffers. The concrete type returned depends on the datatype.

Parameters:	type (DataType) – The value type of the array length (int) – The number of values in the array buffers (List[Buffer]) – The buffers backing this array null_count (int, default -1) – offset (int, default 0) – The array’s logical offset (in values, not in bytes) from the start of each buffer
Returns:	array (Array)

static from_pandas(obj, mask=None, type=None, bool safe=True, MemoryPool memory_pool=None)¶

Convert pandas.Series to an Arrow Array, using pandas’s semantics about what values indicate nulls. See pyarrow.array for more general conversion from arrays or sequences to Arrow arrays.

Parameters:

sequence (ndarray, Inded Series) –
mask (array (boolean), optional) – Indicate which values are null (True) or not null (False)
type (pyarrow.DataType) – Explicit type to attempt to coerce to, otherwise will be inferred from the data
safe (boolean, default True) – Check for overflows or other unsafe conversions
memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default memory pool

Notes

Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC.

Returns:	array (pyarrow.Array or pyarrow.ChunkedArray (if object data) overflows binary buffer)

isnull(self)¶

null_count¶

offset¶: A relative position into another array’s data, to enable zero-copy slicing. This value defaults to zero but must be applied on all operations with the physical storage buffers.

slice(self, offset=0, length=None)¶

Compute zero-copy slice of this array

Parameters:	offset (int, default 0) – Offset from start of array to slice length (int, default None) – Length of slice (default is until end of Array starting from offset)
Returns:	sliced (RecordBatch)

to_numpy(self)¶

Experimental: return a NumPy view of this array. Only primitive arrays with the same memory layout as NumPy (i.e. integers, floating point), without any nulls, are supported.

Returns:	array (numpy.ndarray)

to_pandas(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)¶

Convert to a pandas-compatible NumPy array or DataFrame, as appropriate

Parameters:

strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
date_as_object (boolean, default False) – Cast dates to objects
use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present

Returns:

NumPy array or DataFrame depending on type of object

to_pylist(self)¶

Convert to a list of native Python objects.

Returns:	lst (list)

type¶

unique(self)¶: Compute distinct elements in array

validate(self)¶

Perform any validation checks implemented by arrow::ValidateArray. Raises exception with error message if array does not validate

Raises:	`ArrowInvalid`