pyarrow.Column

class pyarrow.Column

Bases: pyarrow.lib._PandasConvertible

Named vector of elements of equal type.

Warning

Do not call this class’s constructor directly.

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

cast(self, target_type, bool safe=True) Cast column values to another data type
dictionary_encode(self) Compute dictionary-encoded representation of array
equals(self, Column other) Check if contents of two columns are equal
flatten(self, MemoryPool memory_pool=None) Flatten this Column.
from_array(*args)
length(self)
to_pandas(self[, categories]) Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
to_pylist(self) Convert to a list of native Python objects.
unique(self) Compute distinct elements in array

Attributes

data The underlying data
field
name Label of the column
null_count Number of null entires
shape Dimensions of this columns
type Type information for this column
cast(self, target_type, bool safe=True)

Cast column values to another data type

Parameters:
  • target_type (DataType) – Type to cast to
  • safe (boolean, default True) – Check for overflows or other unsafe conversions
Returns:

casted (Column)

data

The underlying data

Returns:pyarrow.ChunkedArray
dictionary_encode(self)

Compute dictionary-encoded representation of array

Returns:pyarrow.Column – Same chunking as the input, all chunks share a common dictionary.
equals(self, Column other)

Check if contents of two columns are equal

Parameters:other (pyarrow.Column) –
Returns:are_equal (boolean)
field
flatten(self, MemoryPool memory_pool=None)

Flatten this Column. If it has a struct type, the column is flattened into one column per struct field.

Parameters:memory_pool (MemoryPool, default None) – For memory allocations, if required, otherwise use default pool
Returns:result (List[Column])
static from_array(*args)
length(self)
name

Label of the column

Returns:str
null_count

Number of null entires

Returns:int
shape

Dimensions of this columns

Returns:(int,)
to_pandas(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)

Convert to a pandas-compatible NumPy array or DataFrame, as appropriate

Parameters:
  • strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
  • categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
  • zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
  • integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
  • date_as_object (boolean, default False) – Cast dates to objects
  • use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
  • deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
  • ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present
Returns:

NumPy array or DataFrame depending on type of object

to_pylist(self)

Convert to a list of native Python objects.

type

Type information for this column

Returns:pyarrow.DataType
unique(self)

Compute distinct elements in array

Returns:pyarrow.Array