pyarrow.RecordBatchFileReader

class pyarrow.RecordBatchFileReader(source, footer_offset=None)[source]

Bases: pyarrow.lib._RecordBatchFileReader, pyarrow.ipc._ReadPandasOption

Class for reading Arrow record batch data from the Arrow binary file format

Parameters:
  • source (bytes/buffer-like, pyarrow.NativeFile, or file-like Python object) – Either an in-memory buffer, or a readable file object
  • footer_offset (int, default None) – If the file is embedded in some larger file, this is the byte offset to the very end of the file data
__init__(source, footer_offset=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(source[, footer_offset]) Initialize self.
get_batch(self, int i)
get_record_batch _RecordBatchFileReader.get_batch(self, int i)
read_all(self) Read all record batches as a pyarrow.Table
read_pandas(**options) Read contents of stream and convert to pandas.DataFrame using Table.to_pandas

Attributes

num_record_batches
schema
get_batch(self, int i)
get_record_batch()

_RecordBatchFileReader.get_batch(self, int i)

num_record_batches
read_all(self)

Read all record batches as a pyarrow.Table

read_pandas(**options)

Read contents of stream and convert to pandas.DataFrame using Table.to_pandas

Parameters:**options (arguments to forward to Table.to_pandas) –
Returns:df (pandas.DataFrame)
schema