Memory and IO Interfaces¶
This section will introduce you to the major concepts in PyArrow’s memory management and IO systems:
- Buffers
- Memory pools
- File-like and stream-like objects
Referencing and Allocating Memory¶
pyarrow.Buffer¶
The Buffer
object wraps the C++ arrow::Buffer
type
which is the primary tool for memory management in Apache Arrow in C++. It permits
higher-level array classes to safely interact with memory which they may or may
not own. arrow::Buffer
can be zero-copy sliced to permit Buffers to cheaply
reference other Buffers, while preserving memory lifetime and clean
parent-child relationships.
There are many implementations of arrow::Buffer
, but they all provide a
standard interface: a data pointer and length. This is similar to Python’s
built-in buffer protocol and memoryview
objects.
A Buffer
can be created from any Python object implementing
the buffer protocol by calling the py_buffer()
function. Let’s consider
a bytes object:
In [1]: import pyarrow as pa
In [2]: data = b'abcdefghijklmnopqrstuvwxyz'
In [3]: buf = pa.py_buffer(data)
In [4]: buf
Out[4]: <pyarrow.lib.Buffer at 0x7fde442bfc70>
In [5]: buf.size