pyarrow.Column¶
-
class
pyarrow.Column¶ Bases:
pyarrow.lib._PandasConvertibleNamed vector of elements of equal type.
Warning
Do not call this class’s constructor directly.
-
__init__()¶ Initialize self. See help(type(self)) for accurate signature.
Methods
cast(self, target_type, bool safe=True)Cast column values to another data type dictionary_encode(self)Compute dictionary-encoded representation of array equals(self, Column other)Check if contents of two columns are equal flatten(self, MemoryPool memory_pool=None)Flatten this Column. from_array(*args)length(self)to_pandas(self[, categories])Convert to a pandas-compatible NumPy array or DataFrame, as appropriate to_pylist(self)Convert to a list of native Python objects. unique(self)Compute distinct elements in array Attributes
dataThe underlying data fieldnameLabel of the column null_countNumber of null entires shapeDimensions of this columns typeType information for this column -
cast(self, target_type, bool safe=True)¶ Cast column values to another data type
Parameters: - target_type (DataType) – Type to cast to
- safe (boolean, default True) – Check for overflows or other unsafe conversions
Returns: casted (Column)
-
data¶ The underlying data
Returns: pyarrow.ChunkedArray
-
dictionary_encode(self)¶ Compute dictionary-encoded representation of array
Returns: pyarrow.Column – Same chunking as the input, all chunks share a common dictionary.
-
equals(self, Column other)¶ Check if contents of two columns are equal
Parameters: other (pyarrow.Column) – Returns: are_equal (boolean)
-
field¶
-
flatten(self, MemoryPool memory_pool=None)¶ Flatten this Column. If it has a struct type, the column is flattened into one column per struct field.
Parameters: memory_pool (MemoryPool, default None) – For memory allocations, if required, otherwise use default pool Returns: result (List[Column])
-
static
from_array(*args)¶
-
length(self)¶
-
name¶ Label of the column
Returns: str
-
null_count¶ Number of null entires
Returns: int
-
shape¶ Dimensions of this columns
Returns: (int,)
-
to_pandas(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)¶ Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
Parameters: - strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
- categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
- zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
- integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
- date_as_object (boolean, default False) – Cast dates to objects
- use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
- deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
- ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present
Returns: NumPy array or DataFrame depending on type of object
-
to_pylist(self)¶ Convert to a list of native Python objects.
-
type¶ Type information for this column
Returns: pyarrow.DataType
-
unique(self)¶ Compute distinct elements in array
Returns: pyarrow.Array
-