Array Builders

class ArrayBuilder

Base class for all data array builders.

This class provides a facilities for incrementally building the null bitmap (see Append methods) and as a side effect the current number of slots and the null count.

Note
Users are expected to use builders as one of the concrete types below. For example, ArrayBuilder* pointing to BinaryBuilder should be downcast before use.

Subclassed by arrow::BinaryBuilder, arrow::BooleanBuilder, arrow::DenseUnionBuilder, arrow::DictionaryBuilder< T >, arrow::DictionaryBuilder< NullType >, arrow::FixedSizeBinaryBuilder, arrow::internal::AdaptiveIntBuilderBase, arrow::ListBuilder, arrow::NullBuilder, arrow::NumericBuilder< T >, arrow::StructBuilder, arrow::DictionaryBuilder< BinaryType >, arrow::DictionaryBuilder< StringType >

Public Functions

ArrayBuilder *child(int i)

For nested types.

Since the objects are owned by this class instance, we skip shared pointers and just return a raw pointer

virtual Status Resize(int64_t capacity)

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return
Status
Parameters
  • capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

Status Reserve(int64_t additional_capacity)

Ensure that there is enough space allocated to add the indicated number of elements without any further calls to Resize.

The memory allocated is rounded up to the next highest power of 2 similar to memory allocations in STL containers like std::vector

Return
Status
Parameters
  • additional_capacity: the number of additional array values

virtual void Reset()

Reset the builder.

Status Advance(int64_t elements)

For cases where raw data was memcpy’d into the internal buffers, allows us to advance the length of the builder.

It is your responsibility to use this function responsibly.

virtual Status FinishInternal(std::shared_ptr<ArrayData> *out) = 0

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

Status Finish(std::shared_ptr<Array> *out)

Return result of builder as an Array object.

The builder is reset except for DictionaryBuilder.

Return
Status
Parameters
  • out: the finalized Array object

Concrete builder subclasses

class NullBuilder : public arrow::ArrayBuilder

Public Functions

Status FinishInternal(std::shared_ptr<ArrayData> *out)

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

class BooleanBuilder : public arrow::ArrayBuilder

Public Functions

Status AppendNulls(int64_t length)

Write nulls as uint8_t* (0 value indicates null) into pre-allocated memory.

Status Append(const bool val)

Scalar append.

void UnsafeAppend(const bool val)

Scalar append, without checking for capacity.

Status AppendValues(const uint8_t *values, int64_t length, const uint8_t *valid_bytes = NULLPTR)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: a contiguous array of bytes (non-zero is 1)
  • length: the number of values to append
  • valid_bytes: an optional sequence of bytes where non-zero indicates a valid (non-null) value

Status AppendValues(const uint8_t *values, int64_t length, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: a contiguous C array of values
  • length: the number of values to append
  • is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<uint8_t> &values, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: a std::vector of bytes
  • is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<uint8_t> &values)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: a std::vector of bytes

Status AppendValues(const std::vector<bool> &values, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: an std::vector<bool> indicating true (1) or false
  • is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<bool> &values)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: an std::vector<bool> indicating true (1) or false

template <typename ValuesIter>
Status AppendValues(ValuesIter values_begin, ValuesIter values_end)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values_begin: InputIterator to the beginning of the values
  • values_end: InputIterator pointing to the end of the values or null(0) values

template <typename ValuesIter, typename ValidIter>
std::enable_if<!std::is_pointer<ValidIter>::value, Status>::type AppendValues(ValuesIter values_begin, ValuesIter values_end, ValidIter valid_begin)

Append a sequence of elements in one shot, with a specified nullmap.

Return
Status
Parameters
  • values_begin: InputIterator to the beginning of the values
  • values_end: InputIterator pointing to the end of the values
  • valid_begin: InputIterator with elements indication valid(1) or null(0) values

Status FinishInternal(std::shared_ptr<ArrayData> *out)

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

void Reset()

Reset the builder.

Status Resize(int64_t capacity)

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return
Status
Parameters
  • capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

template <typename T>
class NumericBuilder : public arrow::ArrayBuilder

Base class for all Builders that emit an Array of a scalar numerical type.

Public Functions

Status Append(const value_type val)

Append a single scalar and increase the size if necessary.

Status AppendNulls(int64_t length)

Write nulls as uint8_t* (0 value indicates null) into pre-allocated memory The memory at the corresponding data slot is set to 0 to prevent uninitialized memory access.

Status AppendNull()

Append a single null element.

void Reset()

Reset the builder.

Status Resize(int64_t capacity)

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return
Status
Parameters
  • capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

Status AppendValues(const value_type *values, int64_t length, const uint8_t *valid_bytes = NULLPTR)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: a contiguous C array of values
  • length: the number of values to append
  • valid_bytes: an optional sequence of bytes where non-zero indicates a valid (non-null) value

Status AppendValues(const value_type *values, int64_t length, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: a contiguous C array of values
  • length: the number of values to append
  • is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<value_type> &values, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: a std::vector of values
  • is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<value_type> &values)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values: a std::vector of values

Status FinishInternal(std::shared_ptr<ArrayData> *out)

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

template <typename ValuesIter>
Status AppendValues(ValuesIter values_begin, ValuesIter values_end)

Append a sequence of elements in one shot.

Return
Status
Parameters
  • values_begin: InputIterator to the beginning of the values
  • values_end: InputIterator pointing to the end of the values

template <typename ValuesIter, typename ValidIter>
std::enable_if<!std::is_pointer<ValidIter>::value, Status>::type AppendValues(ValuesIter values_begin, ValuesIter values_end, ValidIter valid_begin)

Append a sequence of elements in one shot, with a specified nullmap.

Return
Status
Parameters
  • values_begin: InputIterator to the beginning of the values
  • values_end: InputIterator pointing to the end of the values
  • valid_begin: InputIterator with elements indication valid(1) or null(0) values.

void UnsafeAppend(const value_type val)

Append a single scalar under the assumption that the underlying Buffer is large enough.

This method does not capacity-check; make sure to call Reserve beforehand.

class BinaryBuilder : public arrow::ArrayBuilder

Builder class for variable-length binary data.

Subclassed by arrow::StringBuilder

Public Functions

void UnsafeAppend(const uint8_t *value, int32_t length)

Append without checking capacity.

Offsets and data should have been presized using Reserve() and ReserveData(), respectively.

void Reset()

Reset the builder.

Status Resize(int64_t capacity)

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return
Status
Parameters
  • capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

Status ReserveData(int64_t elements)

Ensures there is enough allocated capacity to append the indicated number of bytes to the value data buffer without additional allocations.

Status FinishInternal(std::shared_ptr<ArrayData> *out)

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

int64_t value_data_length() const

Return
size of values buffer so far

int64_t value_data_capacity() const

Return
capacity of values buffer

const uint8_t *GetValue(int64_t i, int32_t *out_length) const

Temporary access to a value.

This pointer becomes invalid on the next modifying operation.

util::string_view GetView(int64_t i) const

Temporary access to a value.

This view becomes invalid on the next modifying operation.

class StringBuilder : public arrow::BinaryBuilder

Builder class for UTF8 strings.

Public Functions

Status AppendValues(const std::vector<std::string> &values, const uint8_t *valid_bytes = NULLPTR)

Append a sequence of strings in one shot.

Return
Status
Parameters
  • values: a vector of strings
  • valid_bytes: an optional sequence of bytes where non-zero indicates a valid (non-null) value

Status AppendValues(const char **values, int64_t length, const uint8_t *valid_bytes = NULLPTR)

Append a sequence of nul-terminated strings in one shot.

If one of the values is NULL, it is processed as a null value even if the corresponding valid_bytes entry is 1.

Return
Status
Parameters
  • values: a contiguous C array of nul-terminated char *
  • length: the number of values to append
  • valid_bytes: an optional sequence of bytes where non-zero indicates a valid (non-null) value

void UnsafeAppend()

Append without checking capacity.

Offsets and data should have been presized using Reserve() and ReserveData(), respectively.

class FixedSizeBinaryBuilder : public arrow::ArrayBuilder

Subclassed by arrow::Decimal128Builder

Public Functions

void Reset()

Reset the builder.

Status Resize(int64_t capacity)

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return
Status
Parameters
  • capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

Status FinishInternal(std::shared_ptr<ArrayData> *out)

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

int64_t value_data_length() const

Return
size of values buffer so far

const uint8_t *GetValue(int64_t i) const

Temporary access to a value.

This pointer becomes invalid on the next modifying operation.

util::string_view GetView(int64_t i) const

Temporary access to a value.

This view becomes invalid on the next modifying operation.

class Decimal128Builder : public arrow::FixedSizeBinaryBuilder

Public Functions

Status FinishInternal(std::shared_ptr<ArrayData> *out)

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

class ListBuilder : public arrow::ArrayBuilder

Builder class for variable-length list array value types.

To use this class, you must append values to the child array builder and use the Append function to delimit each distinct list value (once the values have been appended to the child array) or use the bulk API to append a sequence of offests and null values.

A note on types. Per arrow/type.h all types in the c++ implementation are logical so even though this class always builds list array, this can represent multiple different logical types. If no logical type is provided at construction time, the class defaults to List<T> where t is taken from the value_builder/values that the object is constructed with.

Public Functions

ListBuilder(MemoryPool *pool, std::shared_ptr<ArrayBuilder> const &value_builder, const std::shared_ptr<DataType> &type = NULLPTR)

Use this constructor to incrementally build the value array along with offsets and null bitmap.

Status Resize(int64_t capacity)

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return
Status
Parameters
  • capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

void Reset()

Reset the builder.

Status FinishInternal(std::shared_ptr<ArrayData> *out)

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

Status AppendValues(const int32_t *offsets, int64_t length, const uint8_t *valid_bytes = NULLPTR)

Vector append.

If passed, valid_bytes is of equal length to values, and any zero byte will be considered as a null for that slot

Status Append(bool is_valid = true)

Start a new variable-length list slot.

This function should be called before beginning to append elements to the value builder

class StructBuilder : public arrow::ArrayBuilder

Append, Resize and Reserve methods are acting on StructBuilder.

Please make sure all these methods of all child-builders’ are consistently called to maintain data-structure consistency.

Public Functions

Status FinishInternal(std::shared_ptr<ArrayData> *out)

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

Status AppendValues(int64_t length, const uint8_t *valid_bytes)

Null bitmap is of equal length to every child field, and any zero byte will be considered as a null for that field, but users must using app- end methods or advance methods of the child builders’ independently to insert data.

Status Append(bool is_valid = true)

Append an element to the Struct.

All child-builders’ Append method must be called independently to maintain data-structure consistency.

void Reset()

Reset the builder.

template <typename T>
class DictionaryBuilder : public arrow::ArrayBuilder

Array builder for created encoded DictionaryArray from dense array.

Unlike other builders, dictionary builder does not completely reset the state on Finish calls. The arrays built after the initial Finish call will reuse the previously created encoding and build a delta dictionary when new terms occur.

data

Public Functions

Status Append(const Scalar &value)

Append a scalar value.

template <typename T1 = T>
Status Append(typename std::enable_if<std::is_base_of<FixedSizeBinaryType, T1>::value, const uint8_t *>::type value)

Append a fixed-width string (only for FixedSizeBinaryType)

template <typename T1 = T>
Status Append(typename std::enable_if<std::is_base_of<FixedSizeBinaryType, T1>::value, const char *>::type value)

Append a fixed-width string (only for FixedSizeBinaryType)

Status AppendNull()

Append a scalar null value.

Status AppendArray(const Array &array)

Append a whole dense array to the builder.

void Reset()

Reset the builder.

Status Resize(int64_t capacity)

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return
Status
Parameters
  • capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

Status FinishInternal(std::shared_ptr<ArrayData> *out)

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return
Status
Parameters
  • out: the finalized ArrayData object

bool is_building_delta()

is the dictionary builder in the delta building mode