Iceberg Go

Iceberg Go is a go-native implementation for accessing iceberg tables.

iceberg is a Golang implementation of the Iceberg table spec.

Feature Support / Roadmap

FileSystem Support

Filesystem Type	Supported
S3	X
Google Cloud Storage	X
Azure Blob Storage	X
Local Filesystem	X

Metadata

Operation	Supported
Get Schema	X
Get Snapshots	X
Get Sort Orders	X
Get Partition Specs	X
Get Manifests	X
Create New Manifests	X
Plan Scan	X
Plan Scan for Snapshot	X

Catalog Support

Operation	REST	Glue	SQL
Load Table	X	X	X
List Tables	X	X	X
Create Table	X	X	X
Register Table	X	X
Update Current Snapshot	X	X	X
Create New Snapshot	X	X	X
Rename Table	X	X	X
Drop Table	X	X	X
Alter Table	X	X	X
Check Table Exists	X	X	X
Set Table Properties	X	X	X
List Namespaces	X	X	X
Create Namespace	X	X	X
Check Namespace Exists	X	X	X
Drop Namespace	X	X	X
Update Namespace Properties	X	X	X
Create View	X		X
Load View			X
List View	X		X
Drop View	X		X
Check View Exists	X		X

Read/Write Data Support

Data can currently be read as an Arrow Table or as a stream of Arrow record batches.

Supported Write Operations

As long as the FileSystem is supported and the Catalog supports altering the table, the following tracks the current write support:

Operation	Supported
Append Stream	X
Append Data Files	X
Rewrite Files
Rewrite manifests
Overwrite Files
Write Pos Delete
Write Eq Delete
Row Delta

Install

In this quickstart, we’ll glean insights from code segments and learn how to:

Installation

To install iceberg-go package, you need to install Go and set your Go workspace first. If you don't have a go.mod file, create it with go mod init gin.

Download and install it:

go get -u github.com/apache/iceberg-go

Import it in your code:

import "github.com/apache/iceberg-go"

CLI

Run go build ./cmd/iceberg from the root of this repository to build the CLI executable, alternately you can run go install github.com/apache/iceberg-go/cmd/iceberg to install it to the bin directory of your GOPATH.

The iceberg CLI usage is very similar to pyiceberg CLI
You can pass the catalog URI with --uri argument.

Example: You can start the Iceberg REST API docker image which runs on default in port 8181

docker pull apache/iceberg-rest-fixture:latest
docker run -p 8181:8181 apache/iceberg-rest-fixture:latest

and run the iceberg CLI pointing to the REST API server.

 ./iceberg --uri http://0.0.0.0:8181 list
┌─────┐
| IDs |
| --- |
└─────┘

Create Namespace

./iceberg --uri http://0.0.0.0:8181 create namespace taxitrips

List Namespace

 ./iceberg --uri http://0.0.0.0:8181 list
┌───────────┐
| IDs       |
| --------- |
| taxitrips |
└───────────┘

Create Table

Notes: Only identity transform is supported at this moment

# Create a simple table with REST catalog and Minio
./iceberg create table default.table-1 
        --properties write.format.default=parquet \
        --partition-spec foo \
        --sort-order foo:desc:nulls-last \
        --schema '[{"id":1,"name":"foo","type":"string","required":false},{"id":2,"name":"bar","type":"int","required":true}]' \
        --catalog rest \
        --uri http://localhost:8181
Table default.table-1 created successfully

# Describe the newly created table
./iceberg describe --catalog rest --uri http://localhost:8181 default.table-1
Table format version | 2                                                                                               
Metadata location    | s3://warehouse/default/table-1/metadata/00000-f0ccaadd-d988-482e-99da-3a37870288fe.metadata.json
Table UUID           | 33fa3fac-e638-4335-a085-343c6d9e7de5                                                            
Last updated         | 1753133512562                                                                                   
Sort Order           | 1: [                                                                                            
                     | 1 desc nulls-last                                                                               
                     | ]                                                                                               
Partition Spec       | [                                                                                               
                     |  1000: foo: identity(1)                                                                          
                     | ]                                                                                               

Current Schema, id=0
├──1: foo: optional string
└──2: bar: required int

Current Snapshot | 

Snapshots

Properties
key                             | value  
-----------------------------------------
write.format.default            | parquet
write.parquet.compression-codec | zstd

Catalog

Catalog is the entry point for accessing iceberg tables. You can use a catalog to:

Create and list namespaces.
Create, load, and drop tables

Currently, only the REST catalog has been implemented, and other catalogs are under active development. Here is an example of how to create a RestCatalog:

import (
    "context"
    "github.com/apache/iceberg-go/catalog"
    "github.com/apache/iceberg-go/catalog/rest"
)

// Create a REST catalog
cat, err := rest.NewCatalog(context.Background(), "rest", "http://localhost:8181", 
    rest.WithOAuthToken("your-token"))
if err != nil {
    log.Fatal(err)
}

You can run the following code to list all root namespaces:

// List all root namespaces
namespaces, err := cat.ListNamespaces(context.Background(), nil)
if err != nil {
    log.Fatal(err)
}

for _, ns := range namespaces {
    fmt.Printf("Namespace: %v\n", ns)
}

Then you can run the following code to create namespace:

// Create a namespace
namespace := catalog.ToIdentifier("my_namespace")
err = cat.CreateNamespace(context.Background(), namespace, nil)
if err != nil {
    log.Fatal(err)
}

Other Catalog Types

SQL Catalog

You can also use SQL-based catalogs:

import (
    "github.com/apache/iceberg-go/catalog"
    "github.com/apache/iceberg-go/io"
)

// Create a SQLite catalog
cat, err := catalog.Load(context.Background(), "local", iceberg.Properties{
    "type":               "sql",
    "uri":                "file:iceberg-catalog.db",
    "sql.dialect":        "sqlite",
    "sql.driver":         "sqlite",
    io.S3Region:          "us-east-1",
    io.S3AccessKeyID:     "admin",
    io.S3SecretAccessKey: "password",
    "warehouse":          "file:///tmp/warehouse",
})
if err != nil {
    log.Fatal(err)
}

Glue Catalog

For AWS Glue integration:

import (
    "github.com/apache/iceberg-go/catalog/glue"
    "github.com/aws/aws-sdk-go-v2/config"
)

// Create AWS config
awsCfg, err := config.LoadDefaultConfig(context.TODO())
if err != nil {
    log.Fatal(err)
}

// Create Glue catalog
cat := glue.NewCatalog(glue.WithAwsConfig(awsCfg))

// Create a table in Glue
tableIdent := catalog.ToIdentifier("my_database", "my_table")
tbl, err := cat.CreateTable(
    context.Background(),
    tableIdent,
    schema,
    catalog.WithLocation("s3://my-bucket/tables/my_table"),
)
if err != nil {
    log.Fatal(err)
}

Table

After creating Catalog, we can manipulate tables through Catalog.

You can use the following code to create a table:

import (
    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog"
    "github.com/apache/iceberg-go/table"
)

// Create a simple schema
schema := iceberg.NewSchemaWithIdentifiers(1, []int{2},
    iceberg.NestedField{ID: 1, Name: "foo", Type: iceberg.PrimitiveTypes.String, Required: false},
    iceberg.NestedField{ID: 2, Name: "bar", Type: iceberg.PrimitiveTypes.Int32, Required: true},
    iceberg.NestedField{ID: 3, Name: "baz", Type: iceberg.PrimitiveTypes.Bool, Required: false},
)

// Create a table identifier
tableIdent := catalog.ToIdentifier("my_namespace", "my_table")

// Create a table with optional properties
tbl, err := cat.CreateTable(
    context.Background(),
    tableIdent,
    schema,
    catalog.WithProperties(map[string]string{"owner": "me"}),
    catalog.WithLocation("s3://my-bucket/tables/my_table"),
)
if err != nil {
    log.Fatal(err)
}

Also, you can load a table directly:

// Load an existing table
tbl, err := cat.LoadTable(context.Background(), tableIdent, nil)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Table: %s\n", tbl.Identifier())
fmt.Printf("Location: %s\n", tbl.MetadataLocation())

Schema Creation

Here are some examples of creating different types of schemas:

// Simple schema with primitive types
simpleSchema := iceberg.NewSchemaWithIdentifiers(1, []int{2},
    iceberg.NestedField{ID: 1, Name: "id", Type: iceberg.PrimitiveTypes.Int32, Required: true},
    iceberg.NestedField{ID: 2, Name: "name", Type: iceberg.PrimitiveTypes.String, Required: false},
    iceberg.NestedField{ID: 3, Name: "active", Type: iceberg.PrimitiveTypes.Bool, Required: false},
)

// Schema with nested struct
nestedSchema := iceberg.NewSchemaWithIdentifiers(1, []int{1},
    iceberg.NestedField{ID: 1, Name: "person", Type: &iceberg.StructType{
        FieldList: []iceberg.NestedField{
            {ID: 2, Name: "name", Type: iceberg.PrimitiveTypes.String, Required: false},
            {ID: 3, Name: "age", Type: iceberg.PrimitiveTypes.Int32, Required: true},
        },
    }, Required: false},
)

// Schema with list and map types
complexSchema := iceberg.NewSchemaWithIdentifiers(1, []int{1},
    iceberg.NestedField{ID: 1, Name: "tags", Type: &iceberg.ListType{
        ElementID: 2, Element: iceberg.PrimitiveTypes.String, ElementRequired: true,
    }, Required: false},
    iceberg.NestedField{ID: 3, Name: "metadata", Type: &iceberg.MapType{
        KeyID: 4, KeyType: iceberg.PrimitiveTypes.String,
        ValueID: 5, ValueType: iceberg.PrimitiveTypes.String, ValueRequired: true,
    }, Required: false},
)

Table Operations

Here are some common table operations:

// List tables in a namespace
tables := cat.ListTables(context.Background(), catalog.ToIdentifier("my_namespace"))
for tableIdent, err := range tables {
    if err != nil {
        log.Printf("Error listing table: %v", err)
        continue
    }
    fmt.Printf("Table: %v\n", tableIdent)
}

// Check if table exists
exists, err := cat.CheckTableExists(context.Background(), tableIdent)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("Table exists: %t\n", exists)

// Drop a table
err = cat.DropTable(context.Background(), tableIdent)
if err != nil {
    log.Fatal(err)
}

// Rename a table
fromIdent := catalog.ToIdentifier("my_namespace", "old_table")
toIdent := catalog.ToIdentifier("my_namespace", "new_table")
renamedTable, err := cat.RenameTable(context.Background(), fromIdent, toIdent)
if err != nil {
    log.Fatal(err)
}

Working with Table Metadata

Once you have a table, you can access its metadata and properties:

// Access table metadata
metadata := tbl.Metadata()
fmt.Printf("Table UUID: %s\n", metadata.TableUUID())
fmt.Printf("Format version: %d\n", metadata.Version())
fmt.Printf("Last updated: %d\n", metadata.LastUpdatedMillis())

// Access table schema
schema := tbl.Schema()
fmt.Printf("Schema ID: %d\n", schema.ID)
fmt.Printf("Number of fields: %d\n", schema.NumFields())

// Access table properties
props := tbl.Properties()
fmt.Printf("Owner: %s\n", props["owner"])

// Access the current snapshot
if snapshot := tbl.CurrentSnapshot(); snapshot != nil {
    fmt.Printf("Current snapshot ID: %d\n", snapshot.SnapshotID)
    fmt.Printf("Snapshot timestamp: %d\n", snapshot.TimestampMs)
}

// List all snapshots
for _, snapshot := range tbl.Snapshots() {
    fmt.Printf("Snapshot %d: %s\n", snapshot.SnapshotID, snapshot.Summary.Operation)
}

Creating Tables with Partitioning

You can create tables with partitioning:

import (
    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog"
)

// Create schema
schema := iceberg.NewSchemaWithIdentifiers(1, []int{1},
    iceberg.NestedField{ID: 1, Name: "id", Type: iceberg.PrimitiveTypes.Int32, Required: true},
    iceberg.NestedField{ID: 2, Name: "name", Type: iceberg.PrimitiveTypes.String, Required: false},
    iceberg.NestedField{ID: 3, Name: "date", Type: iceberg.PrimitiveTypes.Date, Required: false},
)

// Create a partition spec
partitionSpec := iceberg.NewPartitionSpec(
    iceberg.PartitionField{SourceID: 3, FieldID: 1000, Transform: iceberg.IdentityTransform{}, Name: "date"},
)

// Create a table with partitioning
tbl, err := cat.CreateTable(
    context.Background(),
    tableIdent,
    schema,
    catalog.WithPartitionSpec(&partitionSpec),
    catalog.WithLocation("s3://my-bucket/tables/partitioned_table"),
)
if err != nil {
    log.Fatal(err)
}

Glossary

This glossary defines important terms used throughout the Iceberg ecosystem, organized in tables for easy reference.

Core Concepts

Term	Definition
Catalog	A centralized service that manages table metadata and provides a unified interface for accessing Iceberg tables. Catalogs can be implemented as Hive metastore, AWS Glue, REST API, or SQL-based solutions.
Table	A collection of data files organized by a schema, with metadata tracking changes over time through snapshots. Tables support ACID transactions and schema evolution.
Schema	The structure definition of a table, specifying field names, types, and whether fields are required or optional. Schemas are versioned and can evolve over time.
Snapshot	A point-in-time view of a table's data, representing the state after a specific operation (append, overwrite, delete, etc.). Each snapshot contains metadata about the operation and references to data files.
Manifest	A metadata file that lists data files and their metadata (location, partition information, record counts, etc.). Manifests are organized into manifest lists for efficient access.
Manifest List	A file that contains references to manifest files for a specific snapshot, enabling efficient discovery of data files without reading all manifests.

Data Types

Primitive Types

Type	Description
`boolean`	True/false values
`int` (32-bit)	Integer values
`long` (64-bit)	Long integer values
`float` (32-bit)	Single precision floating point
`double` (64-bit)	Double precision floating point
`date`	Date values (days since epoch)
`time`	Time values (microseconds since midnight)
`timestamp`	Timestamp values (microseconds since epoch)
`timestamptz`	Timestamp with timezone
`string`	UTF-8 encoded strings
`uuid`	UUID values
`binary`	Variable length binary data
`fixed[n]`	Fixed length binary data of n bytes
`decimal(p,s)`	Decimal values with precision p and scale s

Nested Types

Type	Description
`struct`	Collection of named fields
`list`	Ordered collection of elements
`map`	Key-value pairs

Operations

Operation	Description
Append	An operation that adds new data files to a table without removing existing data. Creates a new snapshot with the additional files.
Overwrite	An operation that replaces existing data files with new ones, typically based on a partition predicate. Creates a new snapshot with the replacement files.
Delete	An operation that removes data files from a table, either by marking them as deleted or by removing references to them.
Replace	An operation that completely replaces all data in a table with new data, typically used for full table refreshes.

Partitioning

Term	Definition
Partition	A logical division of table data based on column values, used to improve query performance by allowing selective reading of relevant data files.
Partition Spec	Defines how table data is partitioned by specifying source columns and transformations (identity, bucket, truncate, year, month, day, hour).
Partition Field	A field in the partition spec that defines how a source column is transformed for partitioning.
Partition Path	The file system path structure created by partition values, typically in the format `partition_name=value/`.

Partition Transforms

Transform	Description
`identity`	Use the column value directly
`bucket[n]`	Hash the value into n buckets
`truncate[n]`	Truncate strings to n characters
`year`	Extract year from date/timestamp
`month`	Extract month from date/timestamp
`day`	Extract day from date/timestamp
`hour`	Extract hour from timestamp
`void`	Always returns null (used for unpartitioned tables)

Expressions and Predicates

Term	Definition
Expression	A computation or comparison that can be evaluated against table data, used for filtering and transformations.
Predicate	A boolean expression used to filter data, such as column comparisons, null checks, or set membership tests.
Bound Predicate	A predicate that has been resolved against a specific schema, with field references bound to actual columns.
Unbound Predicate	A predicate that contains unresolved field references, typically in string form before binding to a schema.
Literal	A constant value used in expressions and predicates, such as numbers, strings, dates, etc.

File Formats

Format	Usage	Description
Parquet	Data files	The primary data file format used by Iceberg, providing columnar storage with compression and encoding optimizations.
Avro	Metadata files	Used for manifests and manifest lists due to its schema evolution capabilities and compact binary format.
ORC	Data files	An alternative columnar format supported by some Iceberg implementations.

Metadata

Term	Definition
Metadata File	A JSON file containing table metadata including schema, partition spec, properties, and snapshot information.
Metadata Location	The URI pointing to the current metadata file for a table, stored in the catalog.
Properties	Key-value pairs that configure table behavior, such as compression settings, write options, and custom metadata.
Statistics	Metadata about data files including record counts, file sizes, and value ranges for optimization.

Transactions

Term	Definition
Transaction	A sequence of operations that are committed atomically, ensuring data consistency and ACID properties.
Commit	The process of finalizing a transaction by creating a new snapshot and updating the metadata file.
Rollback	The process of undoing changes in a transaction, typically by reverting to a previous snapshot.

References

Term	Definition
Branch	A named reference to a specific snapshot, allowing multiple concurrent views of table data.
Tag	An immutable reference to a specific snapshot, typically used for versioning and releases.

Storage

Term	Definition
Warehouse	The root directory or bucket where table data and metadata are stored.
Location Provider	A component that generates file paths for table data and metadata based on table location and naming conventions.
FileIO	An abstraction layer for reading and writing files across different storage systems (local filesystem, S3, GCS, Azure Blob, etc.).

Query Optimization

Technique	Description
Column Pruning	A technique that reads only the columns needed for a query, reducing I/O and improving performance.
Partition Pruning	A technique that skips reading data files from irrelevant partitions based on query predicates.
Predicate Pushdown	A technique that applies filtering predicates at the storage layer, reducing data transfer and processing.
Statistics-based Optimization	Using table and file statistics to optimize query execution plans and file selection.

Schema Evolution

Term	Definition
Schema Evolution	The process of modifying a table's schema over time while maintaining backward compatibility.
Column Addition	Adding new columns to a table schema, which are typically optional to maintain compatibility.
Column Deletion	Removing columns from a table schema, which may be logical (marking as deleted) or physical.
Column Renaming	Changing column names while preserving data and type information.
Type Evolution	Changing column types in ways that maintain data compatibility (e.g., int32 to int64).

Time Travel

Term	Definition
Time Travel	The ability to query a table as it existed at a specific point in time using snapshot timestamps.
Snapshot Isolation	A property that ensures queries see a consistent view of data as it existed at a specific snapshot.

ACID Properties

Property	Description
Atomicity	Ensures that all operations in a transaction either succeed completely or fail completely.
Consistency	Ensures that the table remains in a valid state after each transaction.
Isolation	Ensures that concurrent transactions do not interfere with each other.
Durability	Ensures that committed changes are permanently stored and survive system failures.

Iceberg Golang

Iceberg Go

Feature Support / Roadmap

FileSystem Support

Metadata

Catalog Support

Read/Write Data Support

Supported Write Operations

Get in Touch

Install

Requirements

Installation

CLI

Catalog

Other Catalog Types

SQL Catalog

Glue Catalog

Table

Schema Creation

Table Operations

Working with Table Metadata

Creating Tables with Partitioning

Glossary

Core Concepts

Data Types

Primitive Types

Nested Types

Operations

Partitioning

Partition Transforms

Expressions and Predicates

File Formats

Metadata

Transactions

References

Storage

Query Optimization

Schema Evolution

Time Travel

ACID Properties