Data Types

enum arrow::Type::type

Main data type enumeration.

This enumeration provides a quick way to interrogate the category of a DataType instance.

Values:

NA

A NULL type having no physical storage.

BOOL

Boolean as 1 bit, LSB bit-packed ordering.

UINT8

Unsigned 8-bit little-endian integer.

INT8

Signed 8-bit little-endian integer.

UINT16

Unsigned 16-bit little-endian integer.

INT16

Signed 16-bit little-endian integer.

UINT32

Unsigned 32-bit little-endian integer.

INT32

Signed 32-bit little-endian integer.

UINT64

Unsigned 64-bit little-endian integer.

INT64

Signed 64-bit little-endian integer.

HALF_FLOAT

2-byte floating point value

FLOAT

4-byte floating point value

DOUBLE

8-byte floating point value

STRING

UTF8 variable-length string as List<Char>

BINARY

Variable-length bytes (no guarantee of UTF8-ness)

FIXED_SIZE_BINARY

Fixed-size binary. Each value occupies the same number of bytes.

DATE32

int32_t days since the UNIX epoch

DATE64

int64_t milliseconds since the UNIX epoch

TIMESTAMP

Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond.

TIME32

Time as signed 32-bit integer, representing either seconds or milliseconds since midnight.

TIME64

Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight.

INTERVAL

YEAR_MONTH or DAY_TIME interval in SQL style.

DECIMAL

Precision- and scale-based decimal type.

Storage type depends on the parameters.

LIST

A list of some logical data type.

STRUCT

Struct of logical types.

UNION

Unions of logical types.

DICTIONARY

Dictionary aka Category type.

MAP

Map, a repeated struct logical type.

EXTENSION

Custom data type, implemented by user.

class DataType

Base class for all data types.

Data types in this library are all logical. They can be expressed as either a primitive physical type (bytes or bits of some fixed size), a nested type consisting of other data types, or another data type (e.g. a timestamp encoded as an int64).

Simple datatypes may be entirely described by their Type::type id, but complex datatypes are usually parametric.

Subclassed by arrow::BinaryType, arrow::ExtensionType, arrow::FixedWidthType, arrow::NestedType, arrow::NullType

Public Functions

bool Equals(const DataType &other, bool check_metadata = true) const

Return whether the types are equal.

Types that are logically convertible from one to another (e.g. List<UInt8> and Binary) are NOT equal.

bool Equals(const std::shared_ptr<DataType> &other) const

Return whether the types are equal.

virtual std::string ToString() const = 0

A string representation of the type, including any children.

virtual std::string name() const = 0

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

Type::type id() const

Return the type category.

Factory functions

These functions are recommended for creating data types. They may return new objects or existing singletons, depending on the type requested.

std::shared_ptr<DataType> arrow::fixed_size_binary(int32_t byte_width)

Create a FixedSizeBinaryType instance.

std::shared_ptr<DataType> arrow::decimal(int32_t precision, int32_t scale)

Create a Decimal128Type instance.

std::shared_ptr<DataType> arrow::list(const std::shared_ptr<Field> &value_type)

Create a ListType instance from its child Field type.

std::shared_ptr<DataType> arrow::list(const std::shared_ptr<DataType> &value_type)

Create a ListType instance from its child DataType.

std::shared_ptr<DataType> arrow::timestamp(TimeUnit::type unit)

Create a TimestampType instance from its unit.

std::shared_ptr<DataType> arrow::timestamp(TimeUnit::type unit, const std::string &timezone)

Create a TimestampType instance from its unit and timezone.

std::shared_ptr<DataType> arrow::time32(TimeUnit::type unit)

Create a 32-bit time type instance.

Unit can be either SECOND or MILLI

std::shared_ptr<DataType> arrow::time64(TimeUnit::type unit)

Create a 64-bit time type instance.

Unit can be either MICRO or NANO

std::shared_ptr<DataType> arrow::struct_(const std::vector<std::shared_ptr<Field>> &fields)

Create a StructType instance.

std::shared_ptr<DataType> arrow::union_(const std::vector<std::shared_ptr<Field>> &child_fields, const std::vector<uint8_t> &type_codes, UnionMode::type mode = UnionMode::SPARSE)

Create a UnionType instance.

std::shared_ptr<DataType> arrow::union_(const std::vector<std::shared_ptr<Array>> &children, UnionMode::type mode = UnionMode::SPARSE)

Create a UnionType instance.

std::shared_ptr<DataType> arrow::dictionary(const std::shared_ptr<DataType> &index_type, const std::shared_ptr<Array> &values, bool ordered = false)

Create a DictionaryType instance.

std::shared_ptr<DataType> arrow::null()

Return a NullType instance.

std::shared_ptr<DataType> arrow::boolean()

Return a BooleanType instance.

std::shared_ptr<DataType> arrow::int8()

Return a Int8Type instance.

std::shared_ptr<DataType> arrow::int16()

Return a Int16Type instance.

std::shared_ptr<DataType> arrow::int32()

Return a Int32Type instance.

std::shared_ptr<DataType> arrow::int64()

Return a Int64Type instance.

std::shared_ptr<DataType> arrow::uint8()

Return a UInt8Type instance.

std::shared_ptr<DataType> arrow::uint16()

Return a UInt16Type instance.

std::shared_ptr<DataType> arrow::uint32()

Return a UInt32Type instance.

std::shared_ptr<DataType> arrow::uint64()

Return a UInt64Type instance.

std::shared_ptr<DataType> arrow::float16()

Return a HalfFloatType instance.

std::shared_ptr<DataType> arrow::float32()

Return a FloatType instance.

std::shared_ptr<DataType> arrow::float64()

Return a DoubleType instance.

std::shared_ptr<DataType> arrow::utf8()

Return a StringType instance.

std::shared_ptr<DataType> arrow::binary()

Return a BinaryType instance.

std::shared_ptr<DataType> arrow::date32()

Return a Date32Type instance.

std::shared_ptr<DataType> arrow::date64()

Return a Date64Type instance.

Concrete type subclasses

Primitive

class NullType : public arrow::DataType, public arrow::NoExtraMeta

Concrete type class for always-null data.

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class BooleanType : public arrow::FixedWidthType, public arrow::NoExtraMeta

Concrete type class for boolean data.

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class Int8Type : public arrow::detail::IntegerTypeImpl<Int8Type, Type::INT8, int8_t>

Concrete type class for signed 8-bit integer data.

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class Int16Type : public arrow::detail::IntegerTypeImpl<Int16Type, Type::INT16, int16_t>

Concrete type class for signed 16-bit integer data.

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class Int32Type : public arrow::detail::IntegerTypeImpl<Int32Type, Type::INT32, int32_t>

Concrete type class for signed 32-bit integer data.

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class Int64Type : public arrow::detail::IntegerTypeImpl<Int64Type, Type::INT64, int64_t>

Concrete type class for signed 64-bit integer data.

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class UInt8Type : public arrow::detail::IntegerTypeImpl<UInt8Type, Type::UINT8, uint8_t>

Concrete type class for unsigned 8-bit integer data.

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class UInt16Type : public arrow::detail::IntegerTypeImpl<UInt16Type, Type::UINT16, uint16_t>

Concrete type class for unsigned 16-bit integer data.

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class UInt32Type : public arrow::detail::IntegerTypeImpl<UInt32Type, Type::UINT32, uint32_t>

Concrete type class for unsigned 32-bit integer data.

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class UInt64Type : public arrow::detail::IntegerTypeImpl<UInt64Type, Type::UINT64, uint64_t>

Concrete type class for unsigned 64-bit integer data.

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class HalfFloatType : public arrow::detail::CTypeImpl<HalfFloatType, FloatingPoint, Type::HALF_FLOAT, uint16_t>

Concrete type class for 16-bit floating-point data.

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class FloatType : public arrow::detail::CTypeImpl<FloatType, FloatingPoint, Type::FLOAT, float>

Concrete type class for 32-bit floating-point data (C “float”)

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class DoubleType : public arrow::detail::CTypeImpl<DoubleType, FloatingPoint, Type::DOUBLE, double>

Concrete type class for 64-bit floating-point data (C “double”)

Public Functions

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

Binary-like

class BinaryType : public arrow::DataType, public arrow::NoExtraMeta

Concrete type class for variable-size binary data.

Subclassed by arrow::StringType

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class StringType : public arrow::BinaryType

Concrete type class for variable-size string data, utf8-encoded.

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class FixedSizeBinaryType : public arrow::FixedWidthType, public arrow::ParametricType

Concrete type class for fixed-size binary data.

Subclassed by arrow::DecimalType

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class Decimal128Type : public arrow::DecimalType

Concrete type class for 128-bit decimal data.

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

Nested

class ListType : public arrow::NestedType

Concrete type class for list data.

List data is nested data where each value is a variable number of child items. Lists can be recursively nested, for example list(list(int32)).

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

class StructType : public arrow::NestedType

Concrete type class for struct data.

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

std::shared_ptr<Field> GetFieldByName(const std::string &name) const

Returns null if name not found.

std::vector<std::shared_ptr<Field>> GetAllFieldsByName(const std::string &name) const

Return all fields having this name.

int GetFieldIndex(const std::string &name) const

Returns -1 if name not found or if there are multiple fields having the same name.

std::vector<int> GetAllFieldIndices(const std::string &name) const

Return the indices of all fields having this name.

class UnionType : public arrow::NestedType

Concrete type class for union data.

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

Dictionary-encoded

class DictionaryType : public arrow::FixedWidthType

Concrete type class for dictionary data.

Public Functions

std::string ToString() const

A string representation of the type, including any children.

std::string name() const

A string name of the type, omitting any child fields.

Note
Experimental API
Since
0.7.0

Public Static Functions

static Status Unify(MemoryPool *pool, const std::vector<const DataType *> &types, std::shared_ptr<DataType> *out_type, std::vector<std::vector<int32_t>> *out_transpose_maps = NULLPTR)

Unify several dictionary types.

Compute a resulting dictionary that will allow the union of values of all input dictionary types. The input types must all have the same value type.

Parameters
  • pool: Memory pool to allocate dictionary values from
  • types: A sequence of input dictionary types
  • out_type: The unified dictionary type
  • out_transpose_maps: (optionally) A sequence of integer vectors, one per input type. Each integer vector represents the transposition of input type indices into unified type indices.

Fields and Schemas

std::shared_ptr<Field> arrow::field(const std::string &name, const std::shared_ptr<DataType> &type, bool nullable = true, const std::shared_ptr<const KeyValueMetadata> &metadata = NULLPTR)

Create a Field instance.

Parameters
  • name: the field name
  • type: the field value type
  • nullable: whether the values are nullable, default true
  • metadata: any custom key-value metadata, default null

std::shared_ptr<Schema> arrow::schema(const std::vector<std::shared_ptr<Field>> &fields, const std::shared_ptr<const KeyValueMetadata> &metadata = NULLPTR)

Create a Schema instance.

Return
schema shared_ptr to Schema
Parameters
  • fields: the schema’s fields
  • metadata: any custom key-value metadata, default null

std::shared_ptr<Schema> arrow::schema(std::vector<std::shared_ptr<Field>> &&fields, const std::shared_ptr<const KeyValueMetadata> &metadata = NULLPTR)

Create a Schema instance.

Return
schema shared_ptr to Schema
Parameters
  • fields: the schema’s fields (rvalue reference)
  • metadata: any custom key-value metadata, default null

class Field

The combination of a field name and data type, with optional metadata.

Fields are used to describe the individual constituents of a nested DataType or a Schema.

A field’s metadata is represented by a KeyValueMetadata instance, which holds arbitrary key-value pairs.

Public Functions

std::shared_ptr<const KeyValueMetadata> metadata() const

Return the field’s attached metadata.

bool HasMetadata() const

Return whether the field has non-empty metadata.

std::shared_ptr<Field> AddMetadata(const std::shared_ptr<const KeyValueMetadata> &metadata) const

Return a copy of this field with the given metadata attached to it.

std::shared_ptr<Field> RemoveMetadata() const

Return a copy of this field without any metadata attached to it.

std::shared_ptr<Field> WithType(const std::shared_ptr<DataType> &type) const

Return a copy of this field with the replaced type.

std::string ToString() const

Return a string representation ot the field.

const std::string &name() const

Return the field name.

std::shared_ptr<DataType> type() const

Return the field data type.

bool nullable() const

Return whether the field is nullable.

class Schema

Sequence of arrow::Field objects describing the columns of a record batch or table data structure.

Public Functions

bool Equals(const Schema &other, bool check_metadata = true) const

Returns true if all of the schema fields are equal.

std::shared_ptr<Field> field(int i) const

Return the ith schema element. Does not boundscheck.

std::shared_ptr<Field> GetFieldByName(const std::string &name) const

Returns null if name not found.

std::vector<std::shared_ptr<Field>> GetAllFieldsByName(const std::string &name) const

Return all fields having this name.

int GetFieldIndex(const std::string &name) const

Returns -1 if name not found.

std::vector<int> GetAllFieldIndices(const std::string &name) const

Return the indices of all fields having this name.

std::shared_ptr<const KeyValueMetadata> metadata() const

The custom key-value metadata, if any.

Return
metadata may be null

std::string ToString() const

Render a string representation of the schema suitable for debugging.

std::shared_ptr<Schema> AddMetadata(const std::shared_ptr<const KeyValueMetadata> &metadata) const

Replace key-value metadata with new metadata.

Return
new Schema
Parameters
  • metadata: new KeyValueMetadata

std::shared_ptr<Schema> RemoveMetadata() const

Return copy of Schema without the KeyValueMetadata.

bool HasMetadata() const

Indicates that Schema has non-empty KevValueMetadata.

int num_fields() const

Return the number of fields (columns) in the schema.