pyarrow.array¶
-
pyarrow.
array
(obj, type=None, mask=None, size=None, bool from_pandas=False, bool safe=True, MemoryPool memory_pool=None)¶ Create pyarrow.Array instance from a Python object
Parameters: - obj (sequence, iterable, ndarray or Series) – If both type and size are specified may be a single use iterable. If not strongly-typed, Arrow type will be inferred for resulting array
- type (pyarrow.DataType) – Explicit type to attempt to coerce to, otherwise will be inferred from the data
- mask (array (boolean), optional) – Indicate which values are null (True) or not null (False).
- size (int64, optional) – Size of the elements. If the imput is larger than size bail at this length. For iterators, if size is larger than the input iterator this will be treated as a “max size”, but will involve an initial allocation of size followed by a resize to the actual size (so if you know the exact size specifying it correctly will give you better performance).
- from_pandas (boolean, default False) – Use pandas’s semantics for inferring nulls from values in ndarray-like data. If passed, the mask tasks precendence, but if a value is unmasked (not-null), but still null according to pandas semantics, then it is null
- safe (boolean, default True) – Check for overflows or other unsafe conversions
- memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default memory pool
Notes
Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC.
Examples
>>> import pandas as pd >>> import pyarrow as pa >>> pa.array(pd.Series([1, 2])) <pyarrow.array.Int64Array object at 0x7f674e4c0e10> [ 1, 2 ]
>>> import numpy as np >>> pa.array(pd.Series([1, 2]), np.array([0, 1], ... dtype=bool)) <pyarrow.array.Int64Array object at 0x7f9019e11208> [ 1, null ]
Returns: - array (pyarrow.Array or pyarrow.ChunkedArray (if object data)
- overflowed binary storage)