pyarrow.Schema

class pyarrow.Schema

Bases: object

__init__()

Initialize self. See help(type(self)) for accurate signature.

Methods

add_metadata(self, metadata) Add metadata as dict of string keys and values to Schema
append(self, Field field) Append a field at the end of the schema.
empty_table(self) Provide an empty table according to the schema.
equals(self, other, bool check_metadata=True) Test if this schema is equal to the other
field_by_name(self, name) Access a field by its name rather than the column index.
from_pandas(type cls, df, …) Returns implied schema from dataframe
get_field_index(self, name)
insert(self, int i, Field field) Add a field at position i to the schema.
remove(self, int i) Remove the field at index i from the schema.
remove_metadata(self) Create new schema without metadata, if any
serialize(self[, memory_pool]) Write Schema to Buffer as encapsulated IPC message
set(self, int i, Field field) Replace a field at position i in the schema.

Attributes

metadata
names The schema’s field names.
pandas_metadata Return deserialized-from-JSON pandas metadata field (if it exists)
types The schema’s field types.
add_metadata(self, metadata)

Add metadata as dict of string keys and values to Schema

Parameters:metadata (dict) – Keys and values must be string-like / coercible to bytes
Returns:schema (pyarrow.Schema)
append(self, Field field)

Append a field at the end of the schema.

Parameters:field (Field) –
Returns:schema (Schema)
empty_table(self)

Provide an empty table according to the schema.

Returns:table (pyarrow.Table)
equals(self, other, bool check_metadata=True)

Test if this schema is equal to the other

Parameters:
  • other (pyarrow.Schema) –
  • check_metadata (bool, default False) – Key/value metadata must be equal too
Returns:

is_equal (boolean)

field_by_name(self, name)

Access a field by its name rather than the column index.

Parameters:name (str) –
Returns:field (pyarrow.Field)
from_pandas(type cls, df, bool preserve_index=True)

Returns implied schema from dataframe

Parameters:
  • df (pandas.DataFrame) –
  • preserve_index (bool, default True) – Whether to store the index as an additional column (or columns, for MultiIndex) in the resulting Table.
Returns:

pyarrow.Schema

Examples

>>> import pandas as pd
>>> import pyarrow as pa
>>> df = pd.DataFrame({
    ...     'int': [1, 2],
    ...     'str': ['a', 'b']
    ... })
>>> pa.Schema.from_pandas(df)
int: int64
str: string
__index_level_0__: int64
get_field_index(self, name)
insert(self, int i, Field field)

Add a field at position i to the schema.

Parameters:
  • i (int) –
  • field (Field) –
Returns:

schema (Schema)

metadata
names

The schema’s field names.

Returns:list of str
pandas_metadata

Return deserialized-from-JSON pandas metadata field (if it exists)

remove(self, int i)

Remove the field at index i from the schema.

Parameters:i (int) –
Returns:schema (Schema)
remove_metadata(self)

Create new schema without metadata, if any

Returns:schema (pyarrow.Schema)
serialize(self, memory_pool=None)

Write Schema to Buffer as encapsulated IPC message

Parameters:memory_pool (MemoryPool, default None) – Uses default memory pool if not specified
Returns:serialized (Buffer)
set(self, int i, Field field)

Replace a field at position i in the schema.

Parameters:
  • i (int) –
  • field (Field) –
Returns:

schema (Schema)

types

The schema’s field types.

Returns:list of DataType