pyarrow.ListArray

class pyarrow.ListArray

Bases: pyarrow.lib.Array

Concrete class for Arrow arrays of a list data type.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

buffers(self) Return a list of Buffer objects pointing to this array’s physical storage.
cast(self, target_type, bool safe=True) Cast array values to another data type.
dictionary_encode(self) Compute dictionary-encoded representation of array
equals(self, Array other)
flatten(self) Unnest this ListArray by one level
format(self, int indent=0, int window=10)
from_arrays(offsets, values, …) Construct ListArray from arrays of int32 offsets and values
from_buffers(DataType type, length, buffers) Construct an Array from a sequence of buffers.
from_pandas(obj[, mask, type]) Convert pandas.Series to an Arrow Array, using pandas’s semantics about what values indicate nulls.
isnull(self)
slice(self[, offset, length]) Compute zero-copy slice of this array
to_numpy(self) Experimental: return a NumPy view of this array.
to_pandas(self[, categories]) Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
to_pylist(self) Convert to a list of native Python objects.
unique(self) Compute distinct elements in array
validate(self) Perform any validation checks implemented by arrow::ValidateArray.

Attributes

null_count
offset A relative position into another array’s data, to enable zero-copy slicing.
type
buffers(self)

Return a list of Buffer objects pointing to this array’s physical storage.

To correctly interpret these buffers, you need to also apply the offset multiplied with the size of the stored data type.

cast(self, target_type, bool safe=True)

Cast array values to another data type.

Example

>>> from datetime import datetime
>>> import pyarrow as pa
>>> arr = pa.array([datetime(2010, 1, 1), datetime(2015, 1, 1)])
>>> arr.type
TimestampType(timestamp[us])

You can use pyarrow.DataType objects to specify the target type:

>>> arr.cast(pa.timestamp('ms'))
<pyarrow.lib.TimestampArray object at 0x10420eb88>
[
  1262304000000,
  1420070400000
]
>>> arr.cast(pa.timestamp('ms')).type
TimestampType(timestamp[ms])

Alternatively, it is also supported to use the string aliases for these types:

>>> arr.cast('timestamp[ms]')
<pyarrow.lib.TimestampArray object at 0x10420eb88>
[
  1262304000000,
  1420070400000
]
>>> arr.cast('timestamp[ms]').type
TimestampType(timestamp[ms])
Parameters:
  • target_type (DataType) – Type to cast to
  • safe (boolean, default True) – Check for overflows or other unsafe conversions
Returns:

casted (Array)

dictionary_encode(self)

Compute dictionary-encoded representation of array

equals(self, Array other)
flatten(self)

Unnest this ListArray by one level

Returns:result (Array)
format(self, int indent=0, int window=10)
static from_arrays(offsets, values, MemoryPool pool=None)

Construct ListArray from arrays of int32 offsets and values

Parameters:
  • offset (Array (int32 type)) –
  • values (Array (any type)) –
Returns:

list_array (ListArray)

static from_buffers(DataType type, length, buffers, null_count=-1, offset=0)

Construct an Array from a sequence of buffers. The concrete type returned depends on the datatype.

Parameters:
  • type (DataType) – The value type of the array
  • length (int) – The number of values in the array
  • buffers (List[Buffer]) – The buffers backing this array
  • null_count (int, default -1) –
  • offset (int, default 0) – The array’s logical offset (in values, not in bytes) from the start of each buffer
Returns:

array (Array)

static from_pandas(obj, mask=None, type=None, bool safe=True, MemoryPool memory_pool=None)

Convert pandas.Series to an Arrow Array, using pandas’s semantics about what values indicate nulls. See pyarrow.array for more general conversion from arrays or sequences to Arrow arrays.

Parameters:
  • sequence (ndarray, Inded Series) –
  • mask (array (boolean), optional) – Indicate which values are null (True) or not null (False)
  • type (pyarrow.DataType) – Explicit type to attempt to coerce to, otherwise will be inferred from the data
  • safe (boolean, default True) – Check for overflows or other unsafe conversions
  • memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default memory pool

Notes

Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC.

Returns:
  • array (pyarrow.Array or pyarrow.ChunkedArray (if object data)
  • overflows binary buffer)
isnull(self)
null_count
offset

A relative position into another array’s data, to enable zero-copy slicing. This value defaults to zero but must be applied on all operations with the physical storage buffers.

slice(self, offset=0, length=None)

Compute zero-copy slice of this array

Parameters:
  • offset (int, default 0) – Offset from start of array to slice
  • length (int, default None) – Length of slice (default is until end of Array starting from offset)
Returns:

sliced (RecordBatch)

to_numpy(self)

Experimental: return a NumPy view of this array. Only primitive arrays with the same memory layout as NumPy (i.e. integers, floating point), without any nulls, are supported.

Returns:array (numpy.ndarray)
to_pandas(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)

Convert to a pandas-compatible NumPy array or DataFrame, as appropriate

Parameters:
  • strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
  • categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
  • zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
  • integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
  • date_as_object (boolean, default False) – Cast dates to objects
  • use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
  • deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
  • ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present
Returns:

NumPy array or DataFrame depending on type of object

to_pylist(self)

Convert to a list of native Python objects.

Returns:lst (list)
type
unique(self)

Compute distinct elements in array

validate(self)

Perform any validation checks implemented by arrow::ValidateArray. Raises exception with error message if array does not validate

Raises:ArrowInvalid