Documentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/arrow/llms.txt
Use this file to discover all available pages before exploring further.
Compute Functions
The compute module provides functions for performing data transformations and computations on Arrow data structures.
Function Registry
All compute functions are registered in a global function registry.
GetFunctionRegistry
Returns the global function registry.
FunctionRegistry* GetFunctionRegistry()
Calling Functions
Functions can be called using the CallFunction convenience API:
Result<Datum> CallFunction(
const std::string& func_name,
const std::vector<Datum>& args,
const FunctionOptions* options = nullptr,
ExecContext* ctx = nullptr)
func_name
const std::string&
required
Name of the function to call
args
const std::vector<Datum>&
required
Input arguments (Arrays, Scalars, or ChunkedArrays)
Function-specific options
Execution context (memory pool, function registry, etc.)
Returns: Result containing output data
Function Classes
Function
Base class for all compute functions.
Returns the function name
Returns the function kind (SCALAR, VECTOR, SCALAR_AGGREGATE, HASH_AGGREGATE, or META)
Returns the function arity (number of required arguments)
Returns the function documentation
args
const std::vector<Datum>&
required
Input arguments
Executes the function with kernel dispatch, batch iteration, and memory allocation handled automatically
Function Kinds
- SCALAR: Operates element-wise on arrays/scalars. Output size matches input size
- VECTOR: Array-to-array operations where behavior depends on entire array values
- SCALAR_AGGREGATE: Computes scalar summary statistics from array input
- HASH_AGGREGATE: Computes grouped summary statistics from array input and group identifiers
- META: Dispatches to other functions, contains no kernels
Arity
Describes the number of required arguments for a function.
struct Arity {
int num_args;
bool is_varargs;
}
Function taking no arguments
Function taking 1 argument
Function taking 2 arguments
Function taking 3 arguments
Minimum number of required arguments (default: 0)
Function taking a variable number of arguments
Common Compute Functions
Arithmetic Functions
Element-wise arithmetic operations.
// Binary operations
Result<Datum> Add(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Subtract(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Multiply(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Divide(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
// Unary operations
Result<Datum> Negate(const Datum& value, ExecContext* ctx = nullptr);
Result<Datum> Abs(const Datum& value, ExecContext* ctx = nullptr);
Comparison Functions
Element-wise comparison operations.
Result<Datum> Equal(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> NotEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Less(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> LessEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Greater(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> GreaterEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Aggregate Functions
Compute summary statistics.
Result<Datum> Sum(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Mean(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Min(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Max(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Count(const Datum& array, const CountOptions& options, ExecContext* ctx = nullptr);
Type Casting
Cast data to different types.
Result<Datum> Cast(const Datum& value, const CastOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Cast(const Datum& value, std::shared_ptr<DataType> to_type, ExecContext* ctx = nullptr);
options
const CastOptions&
required
Cast options including target type
to_type
std::shared_ptr<DataType>
required
Target data type
CastOptions
struct CastOptions {
std::shared_ptr<DataType> to_type;
bool allow_int_overflow = false;
bool allow_time_truncate = false;
bool allow_time_overflow = false;
bool allow_decimal_truncate = false;
bool allow_float_truncate = false;
bool allow_invalid_utf8 = false;
}
Filter and Selection
Filter
Filter an array by a boolean selection mask.
Result<Datum> Filter(const Datum& values, const Datum& filter, const FilterOptions& options, ExecContext* ctx = nullptr);
Array or ChunkedArray to filter
Boolean array indicating which values to keep
options
const FilterOptions&
required
Filter options
Take
Select values by indices.
Result<Datum> Take(const Datum& values, const Datum& indices, const TakeOptions& options, ExecContext* ctx = nullptr);
Array or ChunkedArray to select from
Integer array of indices to select
options
const TakeOptions&
required
Take options (handling of out-of-bounds indices)
String Functions
String Operations
// String predicates
Result<Datum> IsAscii(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> IsUtf8(const Datum& strings, ExecContext* ctx = nullptr);
// String transformations
Result<Datum> Utf8Upper(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> Utf8Lower(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> Utf8Reverse(const Datum& strings, ExecContext* ctx = nullptr);
// String trimming
Result<Datum> Utf8Trim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Utf8LTrim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Utf8RTrim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Execution Context
ExecContext
Provides execution context including memory pool and function registry.
class ExecContext {
public:
explicit ExecContext(MemoryPool* pool = default_memory_pool(),
FunctionRegistry* func_registry = nullptr);
MemoryPool* memory_pool() const;
FunctionRegistry* func_registry() const;
}
Returns the memory pool for allocations
Returns the function registry for looking up functions