pyarrow.array

pyarrow.array(obj, type=None, mask=None, size=None, bool from_pandas=False, bool safe=True, MemoryPool memory_pool=None)

Create pyarrow.Array instance from a Python object

Parameters:
  • obj (sequence, iterable, ndarray or Series) – If both type and size are specified may be a single use iterable. If not strongly-typed, Arrow type will be inferred for resulting array
  • type (pyarrow.DataType) – Explicit type to attempt to coerce to, otherwise will be inferred from the data
  • mask (array (boolean), optional) – Indicate which values are null (True) or not null (False).
  • size (int64, optional) – Size of the elements. If the imput is larger than size bail at this length. For iterators, if size is larger than the input iterator this will be treated as a “max size”, but will involve an initial allocation of size followed by a resize to the actual size (so if you know the exact size specifying it correctly will give you better performance).
  • from_pandas (boolean, default False) – Use pandas’s semantics for inferring nulls from values in ndarray-like data. If passed, the mask tasks precendence, but if a value is unmasked (not-null), but still null according to pandas semantics, then it is null
  • safe (boolean, default True) – Check for overflows or other unsafe conversions
  • memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default memory pool

Notes

Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC.

Examples

>>> import pandas as pd
>>> import pyarrow as pa
>>> pa.array(pd.Series([1, 2]))
<pyarrow.array.Int64Array object at 0x7f674e4c0e10>
[
  1,
  2
]
>>> import numpy as np
>>> pa.array(pd.Series([1, 2]), np.array([0, 1],
... dtype=bool))
<pyarrow.array.Int64Array object at 0x7f9019e11208>
[
  1,
  null
]
Returns:
  • array (pyarrow.Array or pyarrow.ChunkedArray (if object data)
  • overflowed binary storage)