pyarrow.Table¶

class pyarrow.Table¶

Bases: pyarrow.lib._PandasConvertible

A collection of top-level named, equal length Arrow arrays.

Warning

Do not call this class’s constructor directly, use one of the from_* methods instead.

__init__()¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`add_column`(self, int i, Column column)	Add column to Table at position.
`append_column`(self, Column column)	Append column at end of columns.
`cast`(self, Schema target_schema, bool safe=True)	Cast table values to another schema
`column`(self, i)	Select a column by its column name, or numeric index.
`drop`(self, columns)	Drop one or more columns and return a new table.
`equals`(self, Table other)	Check if contents of two tables are equal
`flatten`(self, MemoryPool memory_pool=None)	Flatten this Table.
`from_arrays`(arrays[, names, schema, metadata])	Construct a Table from Arrow arrays or columns
`from_batches`(batches, Schema schema=None)	Construct a Table from a sequence or iterator of Arrow RecordBatches
`from_pandas`(type cls, df, …[, nthreads, …])	Convert pandas.DataFrame to an Arrow Table.
`itercolumns`(self)	Iterator over all columns in their numerical order
`remove_column`(self, int i)	Create new Table with the indicated column removed
`replace_schema_metadata`(self[, metadata])	EXPERIMENTAL: Create shallow copy of table by replacing schema key-value metadata with the indicated new metadata (which may be None, which deletes any existing metadata
`set_column`(self, int i, Column column)	Replace column in Table at position.
`to_batches`(self[, chunksize])	Convert Table to list of (contiguous) RecordBatch objects, with optimal maximum chunk size
`to_pandas`(self[, categories])	Convert to a pandas-compatible NumPy array or DataFrame, as appropriate
`to_pydict`(self)	Converted the arrow::Table to an OrderedDict

Attributes

`columns`	List of all columns in numerical order
`num_columns`	Number of columns in this table
`num_rows`	Number of rows in this table.
`schema`	Schema of the table and its columns
`shape`	(#rows, #columns)

add_column(self, int i, Column column)¶: Add column to Table at position. Returns new table

append_column(self, Column column)¶: Append column at end of columns. Returns new table

cast(self, Schema target_schema, bool safe=True)¶

Cast table values to another schema

Parameters:	target_schema (Schema) – Schema to cast to, the names and order of fields must match safe (boolean, default True) – Check for overflows or other unsafe conversions
Returns:	casted (Table)

column(self, i)¶

Select a column by its column name, or numeric index.

Parameters:	i (int or string) –
Returns:	pyarrow.Column

columns¶

List of all columns in numerical order

Returns:	list of pa.Column

drop(self, columns)¶

Drop one or more columns and return a new table.

columns: list of str

Returns pa.Table

equals(self, Table other)¶

Check if contents of two tables are equal

Parameters:	other (pyarrow.Table) –
Returns:	are_equal (boolean)

flatten(self, MemoryPool memory_pool=None)¶

Flatten this Table. Each column with a struct type is flattened into one column per struct field. Other columns are left unchanged.

Parameters:	memory_pool (MemoryPool, default None) – For memory allocations, if required, otherwise use default pool
Returns:	result (Table)

static from_arrays(arrays, names=None, schema=None, metadata=None)¶

Construct a Table from Arrow arrays or columns

Parameters:	arrays (list of pyarrow.Array or pyarrow.Column) – Equal-length arrays that should form the table. names (list of str, optional) – Names for the table columns. If Columns passed, will be inferred. If Arrays passed, this argument is required schema (Schema, default None) – If not passed, will be inferred from the arrays
Returns:	pyarrow.Table

static from_batches(batches, Schema schema=None)¶

Construct a Table from a sequence or iterator of Arrow RecordBatches

Parameters:	batches (sequence or iterator of RecordBatch) – Sequence of RecordBatch to be converted, all schemas must be equal schema (Schema, default None) – If not passed, will be inferred from the first RecordBatch
Returns:	table (Table)

from_pandas(type cls, df, Schema schema=None, bool preserve_index=True, nthreads=None, columns=None, bool safe=True)¶

Convert pandas.DataFrame to an Arrow Table.

The column types in the resulting Arrow Table are inferred from the dtypes of the pandas.Series in the DataFrame. In the case of non-object Series, the NumPy dtype is translated to its Arrow equivalent. In the case of object, we need to guess the datatype by looking at the Python objects in this Series.

Be aware that Series of the object dtype don’t carry enough information to always lead to a meaningful Arrow type. In the case that we cannot infer a type, e.g. because the DataFrame is of length 0 or the Series only contains None/nan objects, the type is set to null. This behavior can be avoided by constructing an explicit schema and passing it to this function.

Parameters:

df (pandas.DataFrame) –
schema (pyarrow.Schema, optional) – The expected schema of the Arrow Table. This can be used to indicate the type of columns if we cannot infer it automatically.
preserve_index (bool, optional) – Whether to store the index as an additional column in the resulting Table.
nthreads (int, default None (may use up to system CPU count threads)) – If greater than 1, convert columns to Arrow in parallel using indicated number of threads
columns (list, optional) – List of column to be converted. If None, use all columns.
safe (boolean, default True) – Check for overflows or other unsafe conversions

Returns:

pyarrow.Table

Examples

>>> import pandas as pd
>>> import pyarrow as pa
>>> df = pd.DataFrame({
    ...     'int': [1, 2],
    ...     'str': ['a', 'b']
    ... })
>>> pa.Table.from_pandas(df)
<pyarrow.lib.Table object at 0x7f05d1fb1b40>

itercolumns(self)¶: Iterator over all columns in their numerical order

num_columns¶

Number of columns in this table

Returns:	int

num_rows¶

Number of rows in this table.

Due to the definition of a table, all columns have the same number of rows.

Returns:	int

remove_column(self, int i)¶: Create new Table with the indicated column removed

replace_schema_metadata(self, metadata=None)¶

EXPERIMENTAL: Create shallow copy of table by replacing schema key-value metadata with the indicated new metadata (which may be None, which deletes any existing metadata

Parameters:	metadata (dict, default None) –
Returns:	shallow_copy (Table)

schema¶

Schema of the table and its columns

Returns:	pyarrow.Schema

set_column(self, int i, Column column)¶: Replace column in Table at position. Returns new table

shape¶

(#rows, #columns)

Returns:	(int, int)
Type:	Dimensions of the table

to_batches(self, chunksize=None)¶

Convert Table to list of (contiguous) RecordBatch objects, with optimal maximum chunk size

Parameters:	chunksize (int, default None) – Maximum size for RecordBatch chunks. Individual chunks may be smaller depending on the chunk layout of individual columns
Returns:	batches (list of RecordBatch)

to_pandas(self, categories=None, bool strings_to_categorical=False, bool zero_copy_only=False, bool integer_object_nulls=False, bool date_as_object=True, bool use_threads=True, bool deduplicate_objects=True, bool ignore_metadata=False)¶

Convert to a pandas-compatible NumPy array or DataFrame, as appropriate

Parameters:

strings_to_categorical (boolean, default False) – Encode string (UTF8) and binary types to pandas.Categorical
categories (list, default empty) – List of fields that should be returned as pandas.Categorical. Only applies to table-like data structures
zero_copy_only (boolean, default False) – Raise an ArrowException if this function call would require copying the underlying data
integer_object_nulls (boolean, default False) – Cast integers with nulls to objects
date_as_object (boolean, default False) – Cast dates to objects
use_threads (boolean, default True) – Whether to parallelize the conversion using multiple threads
deduplicate_objects (boolean, default False) – Do not create multiple copies Python objects when created, to save on memory use. Conversion will be slower
ignore_metadata (boolean, default False) – If True, do not use the ‘pandas’ metadata to reconstruct the DataFrame index, if present

Returns:

NumPy array or DataFrame depending on type of object

to_pydict(self)¶

Converted the arrow::Table to an OrderedDict

Returns:	OrderedDict