Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/apache/arrow/llms.txt

Use this file to discover all available pages before exploring further.

Compute Functions

The compute module provides functions for performing data transformations and computations on Arrow data structures.

Function Registry

All compute functions are registered in a global function registry.

GetFunctionRegistry

Returns the global function registry.
FunctionRegistry* GetFunctionRegistry()

Calling Functions

Functions can be called using the CallFunction convenience API:
Result<Datum> CallFunction(
    const std::string& func_name,
    const std::vector<Datum>& args,
    const FunctionOptions* options = nullptr,
    ExecContext* ctx = nullptr)
func_name
const std::string&
required
Name of the function to call
args
const std::vector<Datum>&
required
Input arguments (Arrays, Scalars, or ChunkedArrays)
options
const FunctionOptions*
Function-specific options
ctx
ExecContext*
Execution context (memory pool, function registry, etc.)
Returns: Result containing output data

Function Classes

Function

Base class for all compute functions.
class Function
name
const std::string&
Returns the function name
kind
Function::Kind
Returns the function kind (SCALAR, VECTOR, SCALAR_AGGREGATE, HASH_AGGREGATE, or META)
arity
const Arity&
Returns the function arity (number of required arguments)
doc
const FunctionDoc&
Returns the function documentation
Execute
Result<Datum>
args
const std::vector<Datum>&
required
Input arguments
options
const FunctionOptions*
Function options
ctx
ExecContext*
Execution context
Executes the function with kernel dispatch, batch iteration, and memory allocation handled automatically

Function Kinds

  • SCALAR: Operates element-wise on arrays/scalars. Output size matches input size
  • VECTOR: Array-to-array operations where behavior depends on entire array values
  • SCALAR_AGGREGATE: Computes scalar summary statistics from array input
  • HASH_AGGREGATE: Computes grouped summary statistics from array input and group identifiers
  • META: Dispatches to other functions, contains no kernels

Arity

Describes the number of required arguments for a function.
struct Arity {
  int num_args;
  bool is_varargs;
}
Nullary
static Arity
Function taking no arguments
Unary
static Arity
Function taking 1 argument
Binary
static Arity
Function taking 2 arguments
Ternary
static Arity
Function taking 3 arguments
VarArgs
static Arity
min_args
int
Minimum number of required arguments (default: 0)
Function taking a variable number of arguments

Common Compute Functions

Arithmetic Functions

Element-wise arithmetic operations.
// Binary operations
Result<Datum> Add(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Subtract(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Multiply(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Divide(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);

// Unary operations
Result<Datum> Negate(const Datum& value, ExecContext* ctx = nullptr);
Result<Datum> Abs(const Datum& value, ExecContext* ctx = nullptr);

Comparison Functions

Element-wise comparison operations.
Result<Datum> Equal(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> NotEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Less(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> LessEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> Greater(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);
Result<Datum> GreaterEqual(const Datum& left, const Datum& right, ExecContext* ctx = nullptr);

Aggregate Functions

Compute summary statistics.
Result<Datum> Sum(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Mean(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Min(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Max(const Datum& array, const ScalarAggregateOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Count(const Datum& array, const CountOptions& options, ExecContext* ctx = nullptr);

Type Casting

Cast data to different types.
Result<Datum> Cast(const Datum& value, const CastOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Cast(const Datum& value, std::shared_ptr<DataType> to_type, ExecContext* ctx = nullptr);
value
const Datum&
required
Value to cast
options
const CastOptions&
required
Cast options including target type
to_type
std::shared_ptr<DataType>
required
Target data type

CastOptions

struct CastOptions {
  std::shared_ptr<DataType> to_type;
  bool allow_int_overflow = false;
  bool allow_time_truncate = false;
  bool allow_time_overflow = false;
  bool allow_decimal_truncate = false;
  bool allow_float_truncate = false;
  bool allow_invalid_utf8 = false;
}

Filter and Selection

Filter

Filter an array by a boolean selection mask.
Result<Datum> Filter(const Datum& values, const Datum& filter, const FilterOptions& options, ExecContext* ctx = nullptr);
values
const Datum&
required
Array or ChunkedArray to filter
filter
const Datum&
required
Boolean array indicating which values to keep
options
const FilterOptions&
required
Filter options

Take

Select values by indices.
Result<Datum> Take(const Datum& values, const Datum& indices, const TakeOptions& options, ExecContext* ctx = nullptr);
values
const Datum&
required
Array or ChunkedArray to select from
indices
const Datum&
required
Integer array of indices to select
options
const TakeOptions&
required
Take options (handling of out-of-bounds indices)

String Functions

String Operations

// String predicates
Result<Datum> IsAscii(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> IsUtf8(const Datum& strings, ExecContext* ctx = nullptr);

// String transformations
Result<Datum> Utf8Upper(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> Utf8Lower(const Datum& strings, ExecContext* ctx = nullptr);
Result<Datum> Utf8Reverse(const Datum& strings, ExecContext* ctx = nullptr);

// String trimming
Result<Datum> Utf8Trim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Utf8LTrim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);
Result<Datum> Utf8RTrim(const Datum& strings, const TrimOptions& options, ExecContext* ctx = nullptr);

Execution Context

ExecContext

Provides execution context including memory pool and function registry.
class ExecContext {
 public:
  explicit ExecContext(MemoryPool* pool = default_memory_pool(),
                       FunctionRegistry* func_registry = nullptr);
  
  MemoryPool* memory_pool() const;
  FunctionRegistry* func_registry() const;
}
memory_pool
MemoryPool*
Returns the memory pool for allocations
func_registry
FunctionRegistry*
Returns the function registry for looking up functions