pyarrow.UnionArray¶
-
class
pyarrow.
UnionArray
¶ Bases:
pyarrow.lib.Array
Concrete class for Arrow arrays of a Union data type.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
Methods
buffers
(self)Return a list of Buffer objects pointing to this array’s physical storage. cast
(self, target_type, bool safe=True)Cast array values to another data type. dictionary_encode
(self)Compute dictionary-encoded representation of array equals
(self, Array other)format
(self, int indent=0, int window=10)from_buffers
(DataType type, length, buffers)Construct an Array from a sequence of buffers. from_dense
(Array types, Array value_offsets, …)Construct dense UnionArray from arrays of int8 types, int32 offsets and children arrays from_pandas
(obj[, mask, type])Convert pandas.Series to an Arrow Array, using pandas’s semantics about what values indicate nulls. from_sparse
(Array types, list children)Construct sparse UnionArray from arrays of int8 types and children arrays isnull
(self)slice
(self[, offset, length])Compute zero-copy slice of this array to_numpy
(self)Experimental: return a NumPy view of this array. to_pandas
(self[, categories])Convert to a pandas-compatible NumPy array or DataFrame, as appropriate to_pylist
(self)Convert to a list of native Python objects. unique
(self)Compute distinct elements in array validate
(self)Perform any validation checks implemented by arrow::ValidateArray. Attributes
null_count
offset
A relative position into another array’s data, to enable zero-copy slicing. type
-
buffers
(self)¶ Return a list of Buffer objects pointing to this array’s physical storage.
To correctly interpret these buffers, you need to also apply the offset multiplied with the size of the stored data type.
-
cast
(self, target_type, bool safe=True)¶ Cast array values to another data type.
Example
>>> from datetime import datetime >>> import pyarrow as pa >>> arr = pa.array([datetime(2010, 1, 1), datetime(2015, 1, 1)]) >>> arr.type TimestampType(timestamp[us])
You can use
pyarrow.DataType
objects to specify the target type:>>> arr.cast(pa.timestamp('ms')) <pyarrow.lib.TimestampArray object at 0x10420eb88> [ 1262304000000, 1420070400000 ] >>> arr.cast(pa.timestamp('ms')).type TimestampType(timestamp[ms])
Alternatively, it is also supported to use the string aliases for these types:
>>> arr.cast('timestamp[ms]') <pyarrow.lib.TimestampArray object at 0x10420eb88> [ 1262304000000, 1420070400000 ] >>> arr.cast('timestamp[ms]').type TimestampType(timestamp[ms])
Parameters: - target_type (DataType) – Type to cast to
- safe (boolean, default True) – Check for overflows or other unsafe conversions
Returns: casted (Array)
-
dictionary_encode
(self)¶ Compute dictionary-encoded representation of array
-
equals
(self, Array other)¶
-
format
(self, int indent=0, int window=10)¶
-
static
from_buffers
(DataType type, length, buffers, null_count=-1, offset=0)¶ Construct an Array from a sequence of buffers. The concrete type returned depends on the datatype.
Parameters: Returns: array (Array)
-
static
from_dense
(Array types, Array value_offsets, list children)¶ Construct dense UnionArray from arrays of int8 types, int32 offsets and children arrays
Parameters: Returns: union_array (UnionArray)
-
static
from_pandas
(obj, mask=None, type=None, bool safe=True, MemoryPool memory_pool=None)¶ Convert pandas.Series to an Arrow Array, using pandas’s semantics about what values indicate nulls. See pyarrow.array for more general conversion from arrays or sequences to Arrow arrays.
Parameters: - sequence (ndarray, Inded Series) –
- mask (array (boolean), optional) – Indicate which values are null (True) or not null (False)
- type (pyarrow.DataType) – Explicit type to attempt to coerce to, otherwise will be inferred from the data
- safe (boolean, default True) – Check for overflows or other unsafe conversions
- memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default memory pool
Notes
Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC.
Returns: - array (pyarrow.Array or pyarrow.ChunkedArray (if object data)
- overflows binary buffer)
-
static
from_sparse
(Array types, list children)¶ Construct sparse UnionArray from arrays of int8 types and children arrays
Parameters: - types (Array (int8 type)) –
- children (list) –
Returns: union_array (UnionArray)
-
isnull
(self)¶
-
null_count
¶
-
offset
¶ A relative position into another array’s data, to enable zero-copy slicing. This value defaults to zero but must be applied on all operations with the physical storage buffers.
-
slice
(self, offset=0, length=None)¶ Compute zero-copy slice of this array
Parameters: - offset (int, default 0) – Offset from start of array to slice
- length (int, default None) – Length of slice (default is until end of Array starting from offset)
Returns: sliced (RecordBatch)
-
to_numpy
(self)¶ Experimental: return a NumPy view of this array. Only primitive arrays with the same memory layout as NumPy (i.e. integers, floating point), without any nulls, are supported.
Returns: array (numpy.ndarray)
-
to_pandas
(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)¶ Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
Parameters: - strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
- categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
- zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
- integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
- date_as_object (boolean, default False) – Cast dates to objects
- use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
- deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
- ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present
Returns: NumPy array or DataFrame depending on type of object
-
to_pylist
(self)¶ Convert to a list of native Python objects.
Returns: lst (list)
-
type
¶
-
unique
(self)¶ Compute distinct elements in array
-
validate
(self)¶ Perform any validation checks implemented by arrow::ValidateArray. Raises exception with error message if array does not validate
Raises: ArrowInvalid
-