Array Builders¶
-
class
ArrayBuilder
¶ Base class for all data array builders.
This class provides a facilities for incrementally building the null bitmap (see Append methods) and as a side effect the current number of slots and the null count.
- Note
- Users are expected to use builders as one of the concrete types below. For example, ArrayBuilder* pointing to BinaryBuilder should be downcast before use.
Subclassed by arrow::BinaryBuilder, arrow::BooleanBuilder, arrow::DenseUnionBuilder, arrow::DictionaryBuilder< T >, arrow::DictionaryBuilder< NullType >, arrow::FixedSizeBinaryBuilder, arrow::internal::AdaptiveIntBuilderBase, arrow::ListBuilder, arrow::NullBuilder, arrow::NumericBuilder< T >, arrow::StructBuilder, arrow::DictionaryBuilder< BinaryType >, arrow::DictionaryBuilder< StringType >
Public Functions
-
ArrayBuilder *
child
(int i)¶ For nested types.
Since the objects are owned by this class instance, we skip shared pointers and just return a raw pointer
-
virtual Status
Resize
(int64_t capacity)¶ Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.
Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.
- Return
- Status
- Parameters
capacity
: the minimum number of total array values to accommodate. Must be greater than the current capacity.
-
Status
Reserve
(int64_t additional_capacity)¶ Ensure that there is enough space allocated to add the indicated number of elements without any further calls to Resize.
The memory allocated is rounded up to the next highest power of 2 similar to memory allocations in STL containers like std::vector
- Return
- Status
- Parameters
additional_capacity
: the number of additional array values
-
virtual void
Reset
()¶ Reset the builder.
-
Status
Advance
(int64_t elements)¶ For cases where raw data was memcpy’d into the internal buffers, allows us to advance the length of the builder.
It is your responsibility to use this function responsibly.
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
Return result of builder as an Array object.
The builder is reset except for DictionaryBuilder.
Concrete builder subclasses¶
-
class
NullBuilder
: public arrow::ArrayBuilder¶ Public Functions
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
-
class
BooleanBuilder
: public arrow::ArrayBuilder¶ Public Functions
-
Status
AppendNulls
(int64_t length)¶ Write nulls as uint8_t* (0 value indicates null) into pre-allocated memory.
-
void
UnsafeAppend
(const bool val)¶ Scalar append, without checking for capacity.
-
Status
AppendValues
(const uint8_t *values, int64_t length, const uint8_t *valid_bytes = NULLPTR)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: a contiguous array of bytes (non-zero is 1)length
: the number of values to appendvalid_bytes
: an optional sequence of bytes where non-zero indicates a valid (non-null) value
-
Status
AppendValues
(const uint8_t *values, int64_t length, const std::vector<bool> &is_valid)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: a contiguous C array of valueslength
: the number of values to appendis_valid
: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values
-
Status
AppendValues
(const std::vector<uint8_t> &values, const std::vector<bool> &is_valid)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: a std::vector of bytesis_valid
: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values
-
Status
AppendValues
(const std::vector<uint8_t> &values)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: a std::vector of bytes
-
Status
AppendValues
(const std::vector<bool> &values, const std::vector<bool> &is_valid)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: an std::vector<bool> indicating true (1) or falseis_valid
: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values
-
Status
AppendValues
(const std::vector<bool> &values)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: an std::vector<bool> indicating true (1) or false
-
template <typename ValuesIter>
StatusAppendValues
(ValuesIter values_begin, ValuesIter values_end)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values_begin
: InputIterator to the beginning of the valuesvalues_end
: InputIterator pointing to the end of the values or null(0) values
-
template <typename ValuesIter, typename ValidIter>
std::enable_if<!std::is_pointer<ValidIter>::value, Status>::typeAppendValues
(ValuesIter values_begin, ValuesIter values_end, ValidIter valid_begin)¶ Append a sequence of elements in one shot, with a specified nullmap.
- Return
- Status
- Parameters
values_begin
: InputIterator to the beginning of the valuesvalues_end
: InputIterator pointing to the end of the valuesvalid_begin
: InputIterator with elements indication valid(1) or null(0) values
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
-
void
Reset
()¶ Reset the builder.
-
Status
Resize
(int64_t capacity)¶ Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.
Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.
- Return
- Status
- Parameters
capacity
: the minimum number of total array values to accommodate. Must be greater than the current capacity.
-
Status
-
template <typename T>
classNumericBuilder
: public arrow::ArrayBuilder¶ Base class for all Builders that emit an Array of a scalar numerical type.
Public Functions
-
Status
AppendNulls
(int64_t length)¶ Write nulls as uint8_t* (0 value indicates null) into pre-allocated memory The memory at the corresponding data slot is set to 0 to prevent uninitialized memory access.
-
void
Reset
()¶ Reset the builder.
-
Status
Resize
(int64_t capacity)¶ Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.
Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.
- Return
- Status
- Parameters
capacity
: the minimum number of total array values to accommodate. Must be greater than the current capacity.
-
Status
AppendValues
(const value_type *values, int64_t length, const uint8_t *valid_bytes = NULLPTR)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: a contiguous C array of valueslength
: the number of values to appendvalid_bytes
: an optional sequence of bytes where non-zero indicates a valid (non-null) value
-
Status
AppendValues
(const value_type *values, int64_t length, const std::vector<bool> &is_valid)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: a contiguous C array of valueslength
: the number of values to appendis_valid
: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values
-
Status
AppendValues
(const std::vector<value_type> &values, const std::vector<bool> &is_valid)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: a std::vector of valuesis_valid
: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values
-
Status
AppendValues
(const std::vector<value_type> &values)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values
: a std::vector of values
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
-
template <typename ValuesIter>
StatusAppendValues
(ValuesIter values_begin, ValuesIter values_end)¶ Append a sequence of elements in one shot.
- Return
- Status
- Parameters
values_begin
: InputIterator to the beginning of the valuesvalues_end
: InputIterator pointing to the end of the values
-
template <typename ValuesIter, typename ValidIter>
std::enable_if<!std::is_pointer<ValidIter>::value, Status>::typeAppendValues
(ValuesIter values_begin, ValuesIter values_end, ValidIter valid_begin)¶ Append a sequence of elements in one shot, with a specified nullmap.
- Return
- Status
- Parameters
values_begin
: InputIterator to the beginning of the valuesvalues_end
: InputIterator pointing to the end of the valuesvalid_begin
: InputIterator with elements indication valid(1) or null(0) values.
-
Status
-
class
BinaryBuilder
: public arrow::ArrayBuilder¶ Builder class for variable-length binary data.
Subclassed by arrow::StringBuilder
Public Functions
-
void
UnsafeAppend
(const uint8_t *value, int32_t length)¶ Append without checking capacity.
Offsets and data should have been presized using Reserve() and ReserveData(), respectively.
-
void
Reset
()¶ Reset the builder.
-
Status
Resize
(int64_t capacity)¶ Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.
Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.
- Return
- Status
- Parameters
capacity
: the minimum number of total array values to accommodate. Must be greater than the current capacity.
-
Status
ReserveData
(int64_t elements)¶ Ensures there is enough allocated capacity to append the indicated number of bytes to the value data buffer without additional allocations.
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
-
int64_t
value_data_length
() const¶ - Return
- size of values buffer so far
-
int64_t
value_data_capacity
() const¶ - Return
- capacity of values buffer
-
const uint8_t *
GetValue
(int64_t i, int32_t *out_length) const¶ Temporary access to a value.
This pointer becomes invalid on the next modifying operation.
-
util::string_view
GetView
(int64_t i) const¶ Temporary access to a value.
This view becomes invalid on the next modifying operation.
-
void
-
class
StringBuilder
: public arrow::BinaryBuilder¶ Builder class for UTF8 strings.
Public Functions
-
Status
AppendValues
(const std::vector<std::string> &values, const uint8_t *valid_bytes = NULLPTR)¶ Append a sequence of strings in one shot.
- Return
- Status
- Parameters
values
: a vector of stringsvalid_bytes
: an optional sequence of bytes where non-zero indicates a valid (non-null) value
-
Status
AppendValues
(const char **values, int64_t length, const uint8_t *valid_bytes = NULLPTR)¶ Append a sequence of nul-terminated strings in one shot.
If one of the values is NULL, it is processed as a null value even if the corresponding valid_bytes entry is 1.
- Return
- Status
- Parameters
values
: a contiguous C array of nul-terminated char *length
: the number of values to appendvalid_bytes
: an optional sequence of bytes where non-zero indicates a valid (non-null) value
-
void
UnsafeAppend
()¶ Append without checking capacity.
Offsets and data should have been presized using Reserve() and ReserveData(), respectively.
-
Status
-
class
FixedSizeBinaryBuilder
: public arrow::ArrayBuilder¶ Subclassed by arrow::Decimal128Builder
Public Functions
-
void
Reset
()¶ Reset the builder.
-
Status
Resize
(int64_t capacity)¶ Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.
Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.
- Return
- Status
- Parameters
capacity
: the minimum number of total array values to accommodate. Must be greater than the current capacity.
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
-
int64_t
value_data_length
() const¶ - Return
- size of values buffer so far
-
const uint8_t *
GetValue
(int64_t i) const¶ Temporary access to a value.
This pointer becomes invalid on the next modifying operation.
-
util::string_view
GetView
(int64_t i) const¶ Temporary access to a value.
This view becomes invalid on the next modifying operation.
-
void
-
class
Decimal128Builder
: public arrow::FixedSizeBinaryBuilder¶ Public Functions
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
-
class
ListBuilder
: public arrow::ArrayBuilder¶ Builder class for variable-length list array value types.
To use this class, you must append values to the child array builder and use the Append function to delimit each distinct list value (once the values have been appended to the child array) or use the bulk API to append a sequence of offests and null values.
A note on types. Per arrow/type.h all types in the c++ implementation are logical so even though this class always builds list array, this can represent multiple different logical types. If no logical type is provided at construction time, the class defaults to List<T> where t is taken from the value_builder/values that the object is constructed with.
Public Functions
Use this constructor to incrementally build the value array along with offsets and null bitmap.
-
Status
Resize
(int64_t capacity)¶ Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.
Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.
- Return
- Status
- Parameters
capacity
: the minimum number of total array values to accommodate. Must be greater than the current capacity.
-
void
Reset
()¶ Reset the builder.
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
-
class
StructBuilder
: public arrow::ArrayBuilder¶ Append, Resize and Reserve methods are acting on StructBuilder.
Please make sure all these methods of all child-builders’ are consistently called to maintain data-structure consistency.
Public Functions
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
-
Status
AppendValues
(int64_t length, const uint8_t *valid_bytes)¶ Null bitmap is of equal length to every child field, and any zero byte will be considered as a null for that field, but users must using app- end methods or advance methods of the child builders’ independently to insert data.
-
Status
Append
(bool is_valid = true)¶ Append an element to the Struct.
All child-builders’ Append method must be called independently to maintain data-structure consistency.
-
void
Reset
()¶ Reset the builder.
-
template <typename T>
classDictionaryBuilder
: public arrow::ArrayBuilder¶ Array builder for created encoded DictionaryArray from dense array.
Unlike other builders, dictionary builder does not completely reset the state on Finish calls. The arrays built after the initial Finish call will reuse the previously created encoding and build a delta dictionary when new terms occur.
data
Public Functions
-
template <typename T1 = T>
StatusAppend
(typename std::enable_if<std::is_base_of<FixedSizeBinaryType, T1>::value, const uint8_t *>::type value)¶ Append a fixed-width string (only for FixedSizeBinaryType)
-
template <typename T1 = T>
StatusAppend
(typename std::enable_if<std::is_base_of<FixedSizeBinaryType, T1>::value, const char *>::type value)¶ Append a fixed-width string (only for FixedSizeBinaryType)
-
void
Reset
()¶ Reset the builder.
-
Status
Resize
(int64_t capacity)¶ Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.
Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.
- Return
- Status
- Parameters
capacity
: the minimum number of total array values to accommodate. Must be greater than the current capacity.
Return result of builder as an internal generic ArrayData object.
Resets builder except for dictionary builder
- Return
- Status
- Parameters
out
: the finalized ArrayData object
-
bool
is_building_delta
()¶ is the dictionary builder in the delta building mode
-
template <typename T1 = T>