pyarrow.ChunkedArray¶

class pyarrow.ChunkedArray¶

Bases: pyarrow.lib._PandasConvertible

Array backed via one or more memory chunks.

Warning

Do not call this class’s constructor directly.

__init__()¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`chunk`(self, i)	Select a chunk by its index
`dictionary_encode`(self)	Compute dictionary-encoded representation of array
`equals`(self, ChunkedArray other)	Return whether the contents of two chunked arrays are equal
`format`(self, int indent=0, int window=10)
`iterchunks`(self)
`length`(self)
`slice`(self[, offset, length])	Compute zero-copy slice of this ChunkedArray
`to_pandas`(self[, categories])	Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
`to_pylist`(self)	Convert to a list of native Python objects.
`unique`(self)	Compute distinct elements in array

Attributes

`chunks`
`null_count`	Number of null entires
`num_chunks`	Number of underlying chunks
`type`

chunk(self, i)¶

Select a chunk by its index

Parameters:	i (int) –
Returns:	pyarrow.Array

chunks¶

dictionary_encode(self)¶

Compute dictionary-encoded representation of array

Returns:	pyarrow.ChunkedArray – Same chunking as the input, all chunks share a common dictionary.

equals(self, ChunkedArray other)¶

Return whether the contents of two chunked arrays are equal

Parameters:	other (pyarrow.ChunkedArray) –
Returns:	are_equal (boolean)

format(self, int indent=0, int window=10)¶

iterchunks(self)¶

length(self)¶

null_count¶

Number of null entires

Returns:	int

num_chunks¶

Number of underlying chunks

Returns:	int

slice(self, offset=0, length=None)¶

Compute zero-copy slice of this ChunkedArray

Parameters:	offset (int, default 0) – Offset from start of array to slice length (int, default None) – Length of slice (default is until end of batch starting from offset)
Returns:	sliced (ChunkedArray)

to_pandas(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)¶

Convert to a pandas-compatible NumPy array or DataFrame, as appropriate

Parameters:

strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
date_as_object (boolean, default False) – Cast dates to objects
use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present

Returns:

NumPy array or DataFrame depending on type of object

to_pylist(self)¶: Convert to a list of native Python objects.

type¶

unique(self)¶

Compute distinct elements in array

Returns:	pyarrow.Array