pyarrow.ChunkedArray

class pyarrow.ChunkedArray

Bases: pyarrow.lib._PandasConvertible

Array backed via one or more memory chunks.

Warning

Do not call this class’s constructor directly.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

chunk(self, i) Select a chunk by its index
dictionary_encode(self) Compute dictionary-encoded representation of array
equals(self, ChunkedArray other) Return whether the contents of two chunked arrays are equal
format(self, int indent=0, int window=10)
iterchunks(self)
length(self)
slice(self[, offset, length]) Compute zero-copy slice of this ChunkedArray
to_pandas(self[, categories]) Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
to_pylist(self) Convert to a list of native Python objects.
unique(self) Compute distinct elements in array

Attributes

chunks
null_count Number of null entires
num_chunks Number of underlying chunks
type
chunk(self, i)

Select a chunk by its index

Parameters:i (int) –
Returns:pyarrow.Array
chunks
dictionary_encode(self)

Compute dictionary-encoded representation of array

Returns:pyarrow.ChunkedArray – Same chunking as the input, all chunks share a common dictionary.
equals(self, ChunkedArray other)

Return whether the contents of two chunked arrays are equal

Parameters:other (pyarrow.ChunkedArray) –
Returns:are_equal (boolean)
format(self, int indent=0, int window=10)
iterchunks(self)
length(self)
null_count

Number of null entires

Returns:int
num_chunks

Number of underlying chunks

Returns:int
slice(self, offset=0, length=None)

Compute zero-copy slice of this ChunkedArray

Parameters:
  • offset (int, default 0) – Offset from start of array to slice
  • length (int, default None) – Length of slice (default is until end of batch starting from offset)
Returns:

sliced (ChunkedArray)

to_pandas(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)

Convert to a pandas-compatible NumPy array or DataFrame, as appropriate

Parameters:
  • strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
  • categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
  • zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
  • integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
  • date_as_object (boolean, default False) – Cast dates to objects
  • use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
  • deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
  • ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present
Returns:

NumPy array or DataFrame depending on type of object

to_pylist(self)

Convert to a list of native Python objects.

type
unique(self)

Compute distinct elements in array

Returns:pyarrow.Array