pyarrow.ChunkedArray¶
-
class
pyarrow.
ChunkedArray
¶ Bases:
pyarrow.lib._PandasConvertible
Array backed via one or more memory chunks.
Warning
Do not call this class’s constructor directly.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
Methods
chunk
(self, i)Select a chunk by its index dictionary_encode
(self)Compute dictionary-encoded representation of array equals
(self, ChunkedArray other)Return whether the contents of two chunked arrays are equal format
(self, int indent=0, int window=10)iterchunks
(self)length
(self)slice
(self[, offset, length])Compute zero-copy slice of this ChunkedArray to_pandas
(self[, categories])Convert to a pandas-compatible NumPy array or DataFrame, as appropriate to_pylist
(self)Convert to a list of native Python objects. unique
(self)Compute distinct elements in array Attributes
chunks
null_count
Number of null entires num_chunks
Number of underlying chunks type
-
chunk
(self, i)¶ Select a chunk by its index
Parameters: i (int) – Returns: pyarrow.Array
-
chunks
¶
-
dictionary_encode
(self)¶ Compute dictionary-encoded representation of array
Returns: pyarrow.ChunkedArray – Same chunking as the input, all chunks share a common dictionary.
-
equals
(self, ChunkedArray other)¶ Return whether the contents of two chunked arrays are equal
Parameters: other (pyarrow.ChunkedArray) – Returns: are_equal (boolean)
-
format
(self, int indent=0, int window=10)¶
-
iterchunks
(self)¶
-
length
(self)¶
-
null_count
¶ Number of null entires
Returns: int
-
num_chunks
¶ Number of underlying chunks
Returns: int
-
slice
(self, offset=0, length=None)¶ Compute zero-copy slice of this ChunkedArray
Parameters: - offset (int, default 0) – Offset from start of array to slice
- length (int, default None) – Length of slice (default is until end of batch starting from offset)
Returns: sliced (ChunkedArray)
-
to_pandas
(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)¶ Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
Parameters: - strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
- categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
- zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
- integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
- date_as_object (boolean, default False) – Cast dates to objects
- use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
- deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
- ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present
Returns: NumPy array or DataFrame depending on type of object
-
to_pylist
(self)¶ Convert to a list of native Python objects.
-
type
¶
-
unique
(self)¶ Compute distinct elements in array
Returns: pyarrow.Array
-