pyarrow.Column¶
-
class
pyarrow.
Column
¶ Bases:
pyarrow.lib._PandasConvertible
Named vector of elements of equal type.
Warning
Do not call this class’s constructor directly.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
Methods
cast
(self, target_type, bool safe=True)Cast column values to another data type dictionary_encode
(self)Compute dictionary-encoded representation of array equals
(self, Column other)Check if contents of two columns are equal flatten
(self, MemoryPool memory_pool=None)Flatten this Column. from_array
(*args)length
(self)to_pandas
(self[, categories])Convert to a pandas-compatible NumPy array or DataFrame, as appropriate to_pylist
(self)Convert to a list of native Python objects. unique
(self)Compute distinct elements in array Attributes
data
The underlying data field
name
Label of the column null_count
Number of null entires shape
Dimensions of this columns type
Type information for this column -
cast
(self, target_type, bool safe=True)¶ Cast column values to another data type
Parameters: - target_type (DataType) – Type to cast to
- safe (boolean, default True) – Check for overflows or other unsafe conversions
Returns: casted (Column)
-
data
¶ The underlying data
Returns: pyarrow.ChunkedArray
-
dictionary_encode
(self)¶ Compute dictionary-encoded representation of array
Returns: pyarrow.Column – Same chunking as the input, all chunks share a common dictionary.
-
equals
(self, Column other)¶ Check if contents of two columns are equal
Parameters: other (pyarrow.Column) – Returns: are_equal (boolean)
-
field
¶
-
flatten
(self, MemoryPool memory_pool=None)¶ Flatten this Column. If it has a struct type, the column is flattened into one column per struct field.
Parameters: memory_pool (MemoryPool, default None) – For memory allocations, if required, otherwise use default pool Returns: result (List[Column])
-
static
from_array
(*args)¶
-
length
(self)¶
-
name
¶ Label of the column
Returns: str
-
null_count
¶ Number of null entires
Returns: int
-
shape
¶ Dimensions of this columns
Returns: (int,)
-
to_pandas
(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)¶ Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
Parameters: - strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
- categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
- zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
- integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
- date_as_object (boolean, default False) – Cast dates to objects
- use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
- deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
- ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present
Returns: NumPy array or DataFrame depending on type of object
-
to_pylist
(self)¶ Convert to a list of native Python objects.
-
type
¶ Type information for this column
Returns: pyarrow.DataType
-
unique
(self)¶ Compute distinct elements in array
Returns: pyarrow.Array
-