This guide covers building the Apache Arrow C++ library from source using CMake. Arrow uses an out-of-source build system for flexibility.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/arrow/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
System Requirements
- C++20-enabled compiler: GCC 12+, Clang 14+, or MSVC 2019+
- CMake: Version 3.25 or higher
- Build system: Make or Ninja (recommended)
- Memory: At least 1GB RAM (4GB for debug builds, 8GB for full builds)
Installing Build Tools
- Ubuntu/Debian
- Fedora
- Arch Linux
- macOS
- Windows (MSYS2)
Getting the Source
Build Configuration
Using CMake Presets
Arrow provides convenient CMake presets for common configurations. List available presets:ninja-debug-minimal- Debug build without optional componentsninja-debug-basic- Debug build with tests and reduced dependenciesninja-debug- Full debug build with testsninja-release-minimal- Minimal release buildninja-release- Full release build
Manual Configuration
For more control, configure CMake manually:- Release Build
- Debug Build
- Minimal Build
Key Build Options
Core Options
| Option | Default | Description |
|---|---|---|
CMAKE_BUILD_TYPE | Release | Build type: Debug, Release, RelWithDebInfo |
CMAKE_INSTALL_PREFIX | /usr/local | Installation directory |
ARROW_BUILD_STATIC | ON | Build static libraries |
ARROW_BUILD_SHARED | ON | Build shared libraries |
ARROW_BUILD_TESTS | OFF | Build unit tests |
ARROW_BUILD_BENCHMARKS | OFF | Build benchmarks |
Component Options
| Option | Description |
|---|---|
ARROW_COMPUTE | Compute functions and kernels |
ARROW_CSV | CSV reader/writer |
ARROW_DATASET | Dataset API for reading partitioned data |
ARROW_FILESYSTEM | Filesystem abstraction (S3, GCS, HDFS) |
ARROW_FLIGHT | Arrow Flight RPC framework |
ARROW_FLIGHT_SQL | Flight SQL protocol |
ARROW_GANDIVA | LLVM-based expression compiler |
ARROW_IPC | Inter-process communication |
ARROW_JSON | JSON reader |
ARROW_ORC | ORC file format support |
ARROW_PARQUET | Parquet file format support |
ARROW_ACERO | Acero streaming execution engine |
Advanced Options
| Option | Description |
|---|---|
ARROW_JEMALLOC | Use jemalloc for memory allocation |
ARROW_MIMALLOC | Use mimalloc for memory allocation |
ARROW_USE_CCACHE | Use ccache for faster rebuilds |
ARROW_SIMD_LEVEL | SIMD optimization level (NONE, SSE4_2, AVX2, AVX512) |
ARROW_RUNTIME_SIMD_LEVEL | Runtime SIMD dispatch level |
Building with Dependencies
Bundled vs. System Dependencies
Arrow can either bundle dependencies or use system-installed versions:Common Dependencies
- boost - Required by some components
- brotli, lz4, snappy, zstd - Compression libraries
- gflags, glog, gtest - Development utilities
- protobuf, grpc - Required for Arrow Flight
- thrift - Required for Parquet
- re2, utf8proc - String processing
Installation
Install Arrow after building:Setting Installation Path
Specify installation prefix during configuration:Using Arrow in Your Project
CMake Integration
Create aCMakeLists.txt file:
Use
Arrow::arrow_shared for shared libraries (recommended) or Arrow::arrow_static for static linking.Available Packages
Arrow provides separate packages for each component:Arrow- Core libraryArrowCompute- Compute functionsArrowDataset- Dataset APIArrowAcero- Acero execution engineArrowFlight- Flight RPCArrowFlightSql- Flight SQLParquet- Parquet formatGandiva- Expression compiler
- find_package:
find_package(PackageName REQUIRED) - Shared target:
PackageName::package_name_shared - Static target:
PackageName::package_name_static
pkg-config
Alternatively, use pkg-config:Testing
Run tests after building:Troubleshooting
Out of Memory
Reduce parallel jobs:Missing Dependencies
Use bundled dependencies:CMake Can’t Find Arrow
SetCMAKE_PREFIX_PATH: