Documentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/arrow/llms.txt
Use this file to discover all available pages before exploring further.
This guide will get you up and running with Apache Arrow C++ quickly. You’ll learn how to create arrays, build tables, and read/write data files.
Prerequisites
You’ll need:
- C++17 compatible compiler (GCC 7+, Clang 6+, MSVC 2017+)
- CMake 3.16 or higher
- Basic familiarity with C++ and CMake
Install Arrow C++
Using Package Managers
Ubuntu/Debian
macOS
Conda
sudo apt update
sudo apt install -y libarrow-dev
brew install apache-arrow
conda install -c conda-forge arrow-cpp
Building from Source
For a minimal build:git clone https://github.com/apache/arrow.git
cd arrow/cpp
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
-DARROW_BUILD_TESTS=OFF \
-DARROW_COMPUTE=ON \
-DARROW_CSV=ON \
-DARROW_FILESYSTEM=ON
make -j$(nproc)
sudo make install
Create Your First Arrays
Arrows use builders to create arrays. Each data type has its own builder class.Create a file called arrow_basics.cc:#include <arrow/api.h>
#include <iostream>
int main() {
// Create an Int8 builder
arrow::Int8Builder int8builder;
// Add data to the builder
int8_t days_raw[5] = {1, 12, 17, 23, 28};
ARROW_RETURN_NOT_OK(int8builder.AppendValues(days_raw, 5));
// Finish building the array
std::shared_ptr<arrow::Array> days;
ARROW_ASSIGN_OR_RAISE(days, int8builder.Finish());
std::cout << "Created array with " << days->length() << " elements" << std::endl;
std::cout << days->ToString() << std::endl;
return 0;
}
Key concepts:
Int8Builder - Creates arrays of 8-bit integers
AppendValues() - Adds multiple values at once
Finish() - Completes the array and returns it
ARROW_ASSIGN_OR_RAISE - Macro for error handling
Build Tables from Arrays
Tables organize multiple arrays into named columns with a schema.#include <arrow/api.h>
#include <iostream>
arrow::Status BuildTable() {
// Create arrays for each column
arrow::Int8Builder int8builder;
int8_t days_raw[5] = {1, 12, 17, 23, 28};
ARROW_RETURN_NOT_OK(int8builder.AppendValues(days_raw, 5));
std::shared_ptr<arrow::Array> days;
ARROW_ASSIGN_OR_RAISE(days, int8builder.Finish());
int8_t months_raw[5] = {1, 3, 5, 7, 1};
ARROW_RETURN_NOT_OK(int8builder.AppendValues(months_raw, 5));
std::shared_ptr<arrow::Array> months;
ARROW_ASSIGN_OR_RAISE(months, int8builder.Finish());
arrow::Int16Builder int16builder;
int16_t years_raw[5] = {1990, 2000, 1995, 2000, 1995};
ARROW_RETURN_NOT_OK(int16builder.AppendValues(years_raw, 5));
std::shared_ptr<arrow::Array> years;
ARROW_ASSIGN_OR_RAISE(years, int16builder.Finish());
// Define schema with field names and types
auto schema = arrow::schema({
arrow::field("day", arrow::int8()),
arrow::field("month", arrow::int8()),
arrow::field("year", arrow::int16())
});
// Create table from schema and arrays
auto table = arrow::Table::Make(schema, {days, months, years});
std::cout << table->ToString() << std::endl;
return arrow::Status::OK();
}
int main() {
arrow::Status st = BuildTable();
if (!st.ok()) {
std::cerr << st << std::endl;
return 1;
}
return 0;
}
Output:day: int8
month: int8
year: int16
----
day: [[1,12,17,23,28]]
month: [[1,3,5,7,1]]
year: [[1990,2000,1995,2000,1995]]
Read and Write CSV Files
Arrow provides fast CSV reading and writing capabilities.Create test.csv:name,age,city
Alice,30,NYC
Bob,25,SF
Carol,35,LA
Now read and process it:#include <arrow/api.h>
#include <arrow/csv/api.h>
#include <arrow/io/api.h>
#include <arrow/ipc/api.h>
#include <iostream>
arrow::Status ProcessCSV() {
// Open CSV file for reading
ARROW_ASSIGN_OR_RAISE(
auto input_file,
arrow::io::ReadableFile::Open("test.csv")
);
// Create CSV reader
ARROW_ASSIGN_OR_RAISE(
auto csv_reader,
arrow::csv::TableReader::Make(
arrow::io::default_io_context(),
input_file,
arrow::csv::ReadOptions::Defaults(),
arrow::csv::ParseOptions::Defaults(),
arrow::csv::ConvertOptions::Defaults()
)
);
// Read entire CSV into table
ARROW_ASSIGN_OR_RAISE(auto table, csv_reader->Read());
std::cout << "Read " << table->num_rows() << " rows" << std::endl;
std::cout << table->ToString() << std::endl;
return arrow::Status::OK();
}
int main() {
arrow::Status st = ProcessCSV();
if (!st.ok()) {
std::cerr << st << std::endl;
return 1;
}
return 0;
}
Write Arrow IPC Files
The Arrow IPC format is optimized for fast reading and zero-copy data access.#include <arrow/api.h>
#include <arrow/io/api.h>
#include <arrow/ipc/api.h>
arrow::Status WriteArrowFile(std::shared_ptr<arrow::Table> table) {
// Open output file
ARROW_ASSIGN_OR_RAISE(
auto output_file,
arrow::io::FileOutputStream::Open("data.arrow")
);
// Create IPC writer
ARROW_ASSIGN_OR_RAISE(
auto writer,
arrow::ipc::MakeFileWriter(output_file, table->schema())
);
// Write table
ARROW_RETURN_NOT_OK(writer->WriteTable(*table));
ARROW_RETURN_NOT_OK(writer->Close());
std::cout << "Wrote table to data.arrow" << std::endl;
return arrow::Status::OK();
}
arrow::Status ReadArrowFile() {
// Open input file
ARROW_ASSIGN_OR_RAISE(
auto input_file,
arrow::io::ReadableFile::Open("data.arrow")
);
// Create IPC reader
ARROW_ASSIGN_OR_RAISE(
auto reader,
arrow::ipc::RecordBatchFileReader::Open(input_file)
);
// Read all record batches into table
ARROW_ASSIGN_OR_RAISE(auto table, reader->ReadTable());
std::cout << "Read table with " << table->num_rows() << " rows" << std::endl;
return arrow::Status::OK();
}
Build with CMake
Create CMakeLists.txt to compile your project:cmake_minimum_required(VERSION 3.16)
project(ArrowQuickstart)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
find_package(Arrow REQUIRED)
add_executable(arrow_example arrow_basics.cc)
target_link_libraries(arrow_example PRIVATE Arrow::arrow_shared)
Build and run:mkdir build && cd build
cmake ..
make
./arrow_example
Complete Example
Here’s a complete working example that ties everything together:
#include <arrow/api.h>
#include <arrow/csv/api.h>
#include <arrow/io/api.h>
#include <arrow/ipc/api.h>
#include <iostream>
arrow::Status RunExample() {
// 1. Create arrays
arrow::Int32Builder builder;
ARROW_RETURN_NOT_OK(builder.AppendValues({1, 2, 3, 4, 5}));
ARROW_ASSIGN_OR_RAISE(auto array, builder.Finish());
// 2. Build a table
auto schema = arrow::schema({arrow::field("numbers", arrow::int32())});
auto table = arrow::Table::Make(schema, {array});
std::cout << "Created table:\n" << table->ToString() << std::endl;
// 3. Write to Arrow IPC file
ARROW_ASSIGN_OR_RAISE(
auto output,
arrow::io::FileOutputStream::Open("example.arrow")
);
ARROW_ASSIGN_OR_RAISE(
auto writer,
arrow::ipc::MakeFileWriter(output, schema)
);
ARROW_RETURN_NOT_OK(writer->WriteTable(*table));
ARROW_RETURN_NOT_OK(writer->Close());
std::cout << "\nWrote table to example.arrow" << std::endl;
// 4. Read it back
ARROW_ASSIGN_OR_RAISE(
auto input,
arrow::io::ReadableFile::Open("example.arrow")
);
ARROW_ASSIGN_OR_RAISE(
auto reader,
arrow::ipc::RecordBatchFileReader::Open(input)
);
ARROW_ASSIGN_OR_RAISE(auto read_table, reader->ReadTable());
std::cout << "\nRead table back:\n" << read_table->ToString() << std::endl;
return arrow::Status::OK();
}
int main() {
arrow::Status st = RunExample();
if (!st.ok()) {
std::cerr << "Error: " << st << std::endl;
return 1;
}
return 0;
}
Next Steps
Compute Functions
Learn about Arrow’s compute functions for data processing
Parquet Files
Read and write Parquet files with Arrow
Datasets
Work with multi-file datasets and partitioning
API Reference
Explore the complete C++ API documentation
Common Patterns
Error Handling
Arrow uses macros for consistent error handling:
// Return on error
ARROW_RETURN_NOT_OK(some_operation());
// Assign result or return error
ARROW_ASSIGN_OR_RAISE(auto result, some_operation());
Memory Management
Arrow uses std::shared_ptr for automatic memory management:
std::shared_ptr<arrow::Array> array;
std::shared_ptr<arrow::Table> table;
// Memory is automatically freed when references go out of scope
Working with Nulls
arrow::Int32Builder builder;
builder.Append(1);
builder.AppendNull(); // Add a null value
builder.Append(3);
Troubleshooting
Make sure Arrow is installed and set CMAKE_PREFIX_PATH:cmake -DCMAKE_PREFIX_PATH=/path/to/arrow/install ..
Ensure you’re linking against the correct Arrow libraries:target_link_libraries(your_target PRIVATE
Arrow::arrow_shared
Arrow::parquet_shared # If using Parquet
)
Runtime errors loading shared libraries
Set LD_LIBRARY_PATH (Linux) or DYLD_LIBRARY_PATH (macOS):export LD_LIBRARY_PATH=/path/to/arrow/lib:$LD_LIBRARY_PATH