Data Types¶
-
enum
arrow::Type
::
type
¶ Main data type enumeration.
This enumeration provides a quick way to interrogate the category of a DataType instance.
Values:
-
NA
¶ A NULL type having no physical storage.
-
BOOL
¶ Boolean as 1 bit, LSB bit-packed ordering.
-
UINT8
¶ Unsigned 8-bit little-endian integer.
-
INT8
¶ Signed 8-bit little-endian integer.
-
UINT16
¶ Unsigned 16-bit little-endian integer.
-
INT16
¶ Signed 16-bit little-endian integer.
-
UINT32
¶ Unsigned 32-bit little-endian integer.
-
INT32
¶ Signed 32-bit little-endian integer.
-
UINT64
¶ Unsigned 64-bit little-endian integer.
-
INT64
¶ Signed 64-bit little-endian integer.
-
HALF_FLOAT
¶ 2-byte floating point value
-
FLOAT
¶ 4-byte floating point value
-
DOUBLE
¶ 8-byte floating point value
-
STRING
¶ UTF8 variable-length string as List<Char>
-
BINARY
¶ Variable-length bytes (no guarantee of UTF8-ness)
-
FIXED_SIZE_BINARY
¶ Fixed-size binary. Each value occupies the same number of bytes.
-
DATE32
¶ int32_t days since the UNIX epoch
-
DATE64
¶ int64_t milliseconds since the UNIX epoch
-
TIMESTAMP
¶ Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond.
-
TIME32
¶ Time as signed 32-bit integer, representing either seconds or milliseconds since midnight.
-
TIME64
¶ Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight.
-
INTERVAL
¶ YEAR_MONTH or DAY_TIME interval in SQL style.
-
DECIMAL
¶ Precision- and scale-based decimal type.
Storage type depends on the parameters.
-
LIST
¶ A list of some logical data type.
-
STRUCT
¶ Struct of logical types.
-
UNION
¶ Unions of logical types.
-
DICTIONARY
¶ Dictionary aka Category type.
-
MAP
¶ Map, a repeated struct logical type.
-
EXTENSION
¶ Custom data type, implemented by user.
-
-
class
DataType
¶ Base class for all data types.
Data types in this library are all logical. They can be expressed as either a primitive physical type (bytes or bits of some fixed size), a nested type consisting of other data types, or another data type (e.g. a timestamp encoded as an int64).
Simple datatypes may be entirely described by their Type::type id, but complex datatypes are usually parametric.
Subclassed by arrow::BinaryType, arrow::ExtensionType, arrow::FixedWidthType, arrow::NestedType, arrow::NullType
Public Functions
-
bool
Equals
(const DataType &other, bool check_metadata = true) const¶ Return whether the types are equal.
Types that are logically convertible from one to another (e.g. List<UInt8> and Binary) are NOT equal.
Return whether the types are equal.
-
virtual std::string
ToString
() const = 0¶ A string representation of the type, including any children.
-
virtual std::string
name
() const = 0¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
bool
Factory functions¶
These functions are recommended for creating data types. They may return new objects or existing singletons, depending on the type requested.
-
std::shared_ptr<DataType>
arrow
::
fixed_size_binary
(int32_t byte_width)¶ Create a FixedSizeBinaryType instance.
-
std::shared_ptr<DataType>
arrow
::
decimal
(int32_t precision, int32_t scale)¶ Create a Decimal128Type instance.
-
std::shared_ptr<DataType>
arrow
::
timestamp
(TimeUnit::type unit)¶ Create a TimestampType instance from its unit.
-
std::shared_ptr<DataType>
arrow
::
timestamp
(TimeUnit::type unit, const std::string &timezone)¶ Create a TimestampType instance from its unit and timezone.
-
std::shared_ptr<DataType>
arrow
::
time32
(TimeUnit::type unit)¶ Create a 32-bit time type instance.
Unit can be either SECOND or MILLI
-
std::shared_ptr<DataType>
arrow
::
time64
(TimeUnit::type unit)¶ Create a 64-bit time type instance.
Unit can be either MICRO or NANO
Create a StructType instance.
Create a UnionType instance.
Create a UnionType instance.
Create a DictionaryType instance.
-
std::shared_ptr<DataType>
arrow
::
boolean
()¶ Return a BooleanType instance.
-
std::shared_ptr<DataType>
arrow
::
uint16
()¶ Return a UInt16Type instance.
-
std::shared_ptr<DataType>
arrow
::
uint32
()¶ Return a UInt32Type instance.
-
std::shared_ptr<DataType>
arrow
::
uint64
()¶ Return a UInt64Type instance.
-
std::shared_ptr<DataType>
arrow
::
float16
()¶ Return a HalfFloatType instance.
-
std::shared_ptr<DataType>
arrow
::
float64
()¶ Return a DoubleType instance.
-
std::shared_ptr<DataType>
arrow
::
utf8
()¶ Return a StringType instance.
-
std::shared_ptr<DataType>
arrow
::
binary
()¶ Return a BinaryType instance.
-
std::shared_ptr<DataType>
arrow
::
date32
()¶ Return a Date32Type instance.
-
std::shared_ptr<DataType>
arrow
::
date64
()¶ Return a Date64Type instance.
Concrete type subclasses¶
Primitive¶
-
class
NullType
: public arrow::DataType, public arrow::NoExtraMeta¶ Concrete type class for always-null data.
-
class
BooleanType
: public arrow::FixedWidthType, public arrow::NoExtraMeta¶ Concrete type class for boolean data.
-
class
Int8Type
: public arrow::detail::IntegerTypeImpl<Int8Type, Type::INT8, int8_t>¶ Concrete type class for signed 8-bit integer data.
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
Int16Type
: public arrow::detail::IntegerTypeImpl<Int16Type, Type::INT16, int16_t>¶ Concrete type class for signed 16-bit integer data.
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
Int32Type
: public arrow::detail::IntegerTypeImpl<Int32Type, Type::INT32, int32_t>¶ Concrete type class for signed 32-bit integer data.
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
Int64Type
: public arrow::detail::IntegerTypeImpl<Int64Type, Type::INT64, int64_t>¶ Concrete type class for signed 64-bit integer data.
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
UInt8Type
: public arrow::detail::IntegerTypeImpl<UInt8Type, Type::UINT8, uint8_t>¶ Concrete type class for unsigned 8-bit integer data.
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
UInt16Type
: public arrow::detail::IntegerTypeImpl<UInt16Type, Type::UINT16, uint16_t>¶ Concrete type class for unsigned 16-bit integer data.
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
UInt32Type
: public arrow::detail::IntegerTypeImpl<UInt32Type, Type::UINT32, uint32_t>¶ Concrete type class for unsigned 32-bit integer data.
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
UInt64Type
: public arrow::detail::IntegerTypeImpl<UInt64Type, Type::UINT64, uint64_t>¶ Concrete type class for unsigned 64-bit integer data.
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
HalfFloatType
: public arrow::detail::CTypeImpl<HalfFloatType, FloatingPoint, Type::HALF_FLOAT, uint16_t>¶ Concrete type class for 16-bit floating-point data.
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
FloatType
: public arrow::detail::CTypeImpl<FloatType, FloatingPoint, Type::FLOAT, float>¶ Concrete type class for 32-bit floating-point data (C “float”)
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
-
class
DoubleType
: public arrow::detail::CTypeImpl<DoubleType, FloatingPoint, Type::DOUBLE, double>¶ Concrete type class for 64-bit floating-point data (C “double”)
Public Functions
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::string
Binary-like¶
-
class
BinaryType
: public arrow::DataType, public arrow::NoExtraMeta¶ Concrete type class for variable-size binary data.
Subclassed by arrow::StringType
-
class
StringType
: public arrow::BinaryType¶ Concrete type class for variable-size string data, utf8-encoded.
-
class
FixedSizeBinaryType
: public arrow::FixedWidthType, public arrow::ParametricType¶ Concrete type class for fixed-size binary data.
Subclassed by arrow::DecimalType
-
class
Decimal128Type
: public arrow::DecimalType¶ Concrete type class for 128-bit decimal data.
Nested¶
-
class
ListType
: public arrow::NestedType¶ Concrete type class for list data.
List data is nested data where each value is a variable number of child items. Lists can be recursively nested, for example list(list(int32)).
-
class
StructType
: public arrow::NestedType¶ Concrete type class for struct data.
Public Functions
-
std::string
ToString
() const¶ A string representation of the type, including any children.
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
-
std::shared_ptr<Field>
GetFieldByName
(const std::string &name) const¶ Returns null if name not found.
-
std::vector<std::shared_ptr<Field>>
GetAllFieldsByName
(const std::string &name) const¶ Return all fields having this name.
-
int
GetFieldIndex
(const std::string &name) const¶ Returns -1 if name not found or if there are multiple fields having the same name.
-
std::vector<int>
GetAllFieldIndices
(const std::string &name) const¶ Return the indices of all fields having this name.
-
std::string
-
class
UnionType
: public arrow::NestedType¶ Concrete type class for union data.
Dictionary-encoded¶
-
class
DictionaryType
: public arrow::FixedWidthType¶ Concrete type class for dictionary data.
Public Functions
-
std::string
ToString
() const¶ A string representation of the type, including any children.
-
std::string
name
() const¶ A string name of the type, omitting any child fields.
- Note
- Experimental API
- Since
- 0.7.0
Public Static Functions
Unify several dictionary types.
Compute a resulting dictionary that will allow the union of values of all input dictionary types. The input types must all have the same value type.
- Parameters
pool
: Memory pool to allocate dictionary values fromtypes
: A sequence of input dictionary typesout_type
: The unified dictionary typeout_transpose_maps
: (optionally) A sequence of integer vectors, one per input type. Each integer vector represents the transposition of input type indices into unified type indices.
-
std::string
Fields and Schemas¶
Create a Field instance.
- Parameters
name
: the field nametype
: the field value typenullable
: whether the values are nullable, default truemetadata
: any custom key-value metadata, default null
Create a Schema instance.
- Return
- schema shared_ptr to Schema
- Parameters
fields
: the schema’s fieldsmetadata
: any custom key-value metadata, default null
Create a Schema instance.
- Return
- schema shared_ptr to Schema
- Parameters
fields
: the schema’s fields (rvalue reference)metadata
: any custom key-value metadata, default null
-
class
Field
¶ The combination of a field name and data type, with optional metadata.
Fields are used to describe the individual constituents of a nested DataType or a Schema.
A field’s metadata is represented by a KeyValueMetadata instance, which holds arbitrary key-value pairs.
Public Functions
-
std::shared_ptr<const KeyValueMetadata>
metadata
() const¶ Return the field’s attached metadata.
-
bool
HasMetadata
() const¶ Return whether the field has non-empty metadata.
Return a copy of this field with the given metadata attached to it.
-
std::shared_ptr<Field>
RemoveMetadata
() const¶ Return a copy of this field without any metadata attached to it.
Return a copy of this field with the replaced type.
-
std::string
ToString
() const¶ Return a string representation ot the field.
-
const std::string &
name
() const¶ Return the field name.
-
bool
nullable
() const¶ Return whether the field is nullable.
-
std::shared_ptr<const KeyValueMetadata>
-
class
Schema
¶ Sequence of arrow::Field objects describing the columns of a record batch or table data structure.
Public Functions
-
bool
Equals
(const Schema &other, bool check_metadata = true) const¶ Returns true if all of the schema fields are equal.
-
std::shared_ptr<Field>
GetFieldByName
(const std::string &name) const¶ Returns null if name not found.
-
std::vector<std::shared_ptr<Field>>
GetAllFieldsByName
(const std::string &name) const¶ Return all fields having this name.
-
int
GetFieldIndex
(const std::string &name) const¶ Returns -1 if name not found.
-
std::vector<int>
GetAllFieldIndices
(const std::string &name) const¶ Return the indices of all fields having this name.
-
std::shared_ptr<const KeyValueMetadata>
metadata
() const¶ The custom key-value metadata, if any.
- Return
- metadata may be null
-
std::string
ToString
() const¶ Render a string representation of the schema suitable for debugging.
Replace key-value metadata with new metadata.
- Return
- new Schema
- Parameters
metadata
: new KeyValueMetadata
-
int
num_fields
() const¶ Return the number of fields (columns) in the schema.
-
bool