Iceberg Go
Apache Iceberg Go is a Go-native implementation of Apache Iceberg, the open table format for analytic datasets. Read and write Iceberg tables from Go services and tooling without a JVM.
Where to start
- Install - get the library and CLI on your machine.
- CLI - inspect tables, run maintenance, manage refs.
- API - construct catalogs, scan tables, write Arrow data, evolve schemas.
- Configuration - YAML config file, catalog options, FileIO and credentials, table properties.
What works today
A capability matrix - filesystem, metadata operations, catalog support, and write operations - lives on the dedicated Feature Status page.
Beyond Go
Apache Iceberg is multi-language. PyIceberg, iceberg-rust, iceberg-cpp, and the Java reference implementation all target the same spec - see Other Iceberg implementations for cross-links.
Help and contribution
- Questions and discussion: Community
- Contributing code or docs: Contributing
- Cutting or verifying a release: Releases
The canonical Iceberg specification, terminology, and multi-engine policy live with the main project at iceberg.apache.org. This site covers what is specific to the Go implementation.
Install
In this quickstart, we’ll glean insights from code segments and learn how to:
Requirements
Installation
To install iceberg-go package, you need to install Go and set your Go workspace first.
If you don't have a go.mod file, create it with go mod init gin.
- Download and install it:
go get -u github.com/apache/iceberg-go
- Import it in your code:
import "github.com/apache/iceberg-go"
Getting Started
This walkthrough takes you from go get to a fully written-and-read Iceberg table in a few minutes, using a local SQLite-backed catalog so you do not need any external services. By the end you will have:
- An Iceberg catalog stored in a SQLite file
- A table with an Arrow schema
- Data written from Apache Arrow Go and read back
- A schema evolution committed
- A branch created on top of the table
If you would rather see all the operations as a reference, jump straight to the API.
1. Install
go mod init iceberg-go-tutorial
go get github.com/apache/iceberg-go@latest
go get github.com/apache/arrow-go/v18@latest
go get github.com/uptrace/bun/driver/sqliteshim@latest
iceberg-go itself only registers the local file system. For S3, GCS, or Azure Blob you would also blank-import github.com/apache/iceberg-go/io/gocloud. We are staying on local disk for this tutorial.
2. Open a local catalog
The SQL catalog stores its metadata in a database, and the actual data files live under a "warehouse" directory. We will use SQLite for both the catalog DB and a local folder for the warehouse.
package main
import (
"context"
"database/sql"
"log"
"github.com/apache/iceberg-go"
sqlcat "github.com/apache/iceberg-go/catalog/sql"
"github.com/uptrace/bun/driver/sqliteshim"
)
func openCatalog(ctx context.Context) (*sqlcat.Catalog, error) {
db, err := sql.Open(sqliteshim.ShimName, "file:./tutorial-catalog.db?cache=shared")
if err != nil {
return nil, err
}
return sqlcat.NewCatalog("default", db, sqlcat.SQLite, iceberg.Properties{
"warehouse": "file:///tmp/iceberg-tutorial",
"init_catalog_tables": "true",
})
}
init_catalog_tables: "true" lets the catalog create its bookkeeping tables (iceberg_tables, iceberg_namespace_properties) on first use. Make sure /tmp/iceberg-tutorial is writable.
3. Create a namespace and a table
Iceberg organizes tables under namespaces (the analog of databases). We will create one and put a table inside it.
import (
"github.com/apache/iceberg-go/catalog"
"github.com/apache/iceberg-go/table"
)
func createTrips(ctx context.Context, cat *sqlcat.Catalog) (*table.Table, error) {
ns := table.Identifier{"taxi"}
if err := cat.CreateNamespace(ctx, ns, nil); err != nil {
return nil, err
}
schema := iceberg.NewSchema(1,
iceberg.NestedField{ID: 1, Name: "trip_id", Type: iceberg.PrimitiveTypes.Int64, Required: true},
iceberg.NestedField{ID: 2, Name: "fare", Type: iceberg.PrimitiveTypes.Float64, Required: false},
iceberg.NestedField{ID: 3, Name: "borough", Type: iceberg.PrimitiveTypes.String, Required: false},
)
ident := catalog.ToIdentifier("taxi", "trips")
return cat.CreateTable(ctx, ident, schema)
}
If the namespace already exists, CreateNamespace returns an error - the example skips error handling for brevity; production code should check for iceberg.ErrAlreadyExists and recover.
4. Write some Arrow data
The write path takes either a streaming array.RecordReader or a fully materialized arrow.Table. We will build a small Arrow table inline and append it.
import (
"github.com/apache/arrow-go/v18/arrow"
"github.com/apache/arrow-go/v18/arrow/array"
"github.com/apache/arrow-go/v18/arrow/memory"
)
func writeSomeTrips(ctx context.Context, tbl *table.Table) (*table.Table, error) {
mem := memory.NewGoAllocator()
arrowSchema := arrow.NewSchema([]arrow.Field{
{Name: "trip_id", Type: arrow.PrimitiveTypes.Int64, Nullable: false},
{Name: "fare", Type: arrow.PrimitiveTypes.Float64, Nullable: true},
{Name: "borough", Type: arrow.BinaryTypes.String, Nullable: true},
}, nil)
b := array.NewRecordBuilder(mem, arrowSchema)
defer b.Release()
b.Field(0).(*array.Int64Builder).AppendValues([]int64{1, 2, 3}, nil)
b.Field(1).(*array.Float64Builder).AppendValues([]float64{12.50, 8.75, 22.10}, nil)
b.Field(2).(*array.StringBuilder).AppendValues([]string{"Manhattan", "Brooklyn", "Queens"}, nil)
rec := b.NewRecord()
defer rec.Release()
arrowTbl := array.NewTableFromRecords(arrowSchema, []arrow.Record{rec})
defer arrowTbl.Release()
return tbl.AppendTable(ctx, arrowTbl, 1024 /* batchSize */, nil)
}
AppendTable returns a refreshed *table.Table reflecting the new snapshot.
5. Read the data back
func readAll(ctx context.Context, tbl *table.Table) error {
arrowTbl, err := tbl.Scan().ToArrowTable(ctx)
if err != nil {
return err
}
defer arrowTbl.Release()
log.Printf("read %d rows in %d columns", arrowTbl.NumRows(), arrowTbl.NumCols())
return nil
}
For larger tables, use streaming so only one batch is in memory at a time:
arrowSchema, batches, err := tbl.Scan().ToArrowRecords(ctx)
if err != nil {
return err
}
log.Printf("scan schema: %s", arrowSchema)
for batch, err := range batches {
if err != nil {
return err
}
log.Printf("batch with %d rows", batch.NumRows())
batch.Release()
}
6. Filter and project
Use the predicate DSL to push filters down into the scan, and WithSelectedFields to project columns:
filter := iceberg.GreaterThan(iceberg.Reference("fare"), float64(10.0))
arrowTbl, err := tbl.Scan(
table.WithSelectedFields("trip_id", "fare"),
table.WithRowFilter(filter),
).ToArrowTable(ctx)
7. Evolve the schema
Most schema changes go through a transaction. Add a tip column:
import "github.com/apache/iceberg-go/table"
func addTipColumn(ctx context.Context, tbl *table.Table) (*table.Table, error) {
txn := tbl.NewTransaction()
err := table.NewUpdateSchema(txn, true /* caseSensitive */, false /* allowIncompatible */).
AddColumn([]string{"tip"}, iceberg.PrimitiveTypes.Float64, "Tip in dollars", false, nil).
Commit()
if err != nil {
return nil, err
}
return txn.Commit(ctx)
}
The returned table has the new schema and a fresh metadata file.
8. Branch the table
A branch is a named ref that points at a snapshot. Creating one requires a metadata commit that adds the ref - NewTransactionOnBranch only writes to a branch that already exists. Two steps:
import "github.com/apache/iceberg-go/table"
// Step 1: create the branch via Catalog.CommitTable.
func createExperimentBranch(ctx context.Context, cat *sqlcat.Catalog, tbl *table.Table) (*table.Table, error) {
snap := tbl.CurrentSnapshot()
update := table.NewSetSnapshotRefUpdate(
"experiment", snap.SnapshotID, table.BranchRef,
0 /* maxRefAgeMs */, 0 /* maxSnapshotAgeMs */, 0 /* minSnapshotsToKeep */,
)
reqs := []table.Requirement{
table.AssertTableUUID(tbl.Metadata().TableUUID()),
table.AssertRefSnapshotID("experiment", nil), // branch must not yet exist
}
if _, _, err := cat.CommitTable(ctx, tbl.Identifier(), reqs, []table.Update{update}); err != nil {
return nil, err
}
// Reload so subsequent transactions see the new ref.
return cat.LoadTable(ctx, tbl.Identifier())
}
// Step 2: open a transaction on the branch and write.
func writeOnBranch(ctx context.Context, tbl *table.Table, arrowTbl arrow.Table) (*table.Table, error) {
txn := tbl.NewTransactionOnBranch("experiment")
if err := txn.AppendTable(ctx, arrowTbl, 1024, nil); err != nil {
return nil, err
}
return txn.Commit(ctx)
}
Reads can target the branch with tbl.Scan().UseRef("experiment").
Where to go next
- API - the full Go surface for catalogs, tables, scans, writes, transactions, schema and partition evolution, snapshot management, maintenance, and views.
- Configuration - cloud credentials, table properties, concurrency, custom catalog and IO registration.
- CLI - inspect, expire, compact, branch, and tag from the command line.
- Row Filter Syntax and Expression DSL - the predicate DSL in detail.
The full Iceberg specification, terminology, and engine integration policy live with the main project at iceberg.apache.org.
Configuration
This page documents how to configure Apache Iceberg Go: the CLI's YAML config file, per-catalog Option surfaces, file-system credentials, table write properties, concurrency, and how to plug in custom catalogs and IO backends.
Only properties and options that the iceberg-go code actually reads are listed. Properties defined in the Apache Iceberg spec but not yet wired into iceberg-go are intentionally omitted - check pkg.go.dev/github.com/apache/iceberg-go for the latest read sites.
CLI configuration file
The iceberg CLI loads catalog defaults from ~/.iceberg-go.yaml (override the directory with GOICEBERG_HOME). The schema, defined in config/config.go, is:
default-catalog: default
max-workers: 5
catalog:
default:
type: rest
uri: https://example.com/iceberg
warehouse: s3://my-bucket/warehouse
credential: <client-id>:<client-secret>
output: text
rest:
sigv4-enabled: false
signing-name: ""
signing-region: ""
| Key | Purpose |
|---|---|
default-catalog | Name used when --catalog-name is not passed on the CLI. |
max-workers | Worker pool size for concurrent operations. Default 5. |
catalog.<name>.type | One of rest, hive, glue, sql, hadoop. |
catalog.<name>.uri | Catalog endpoint or DSN. |
catalog.<name>.warehouse | Warehouse identifier (REST/Glue) or location (Hive/SQL). |
catalog.<name>.credential | Credential string passed through to the catalog's auth handler. |
catalog.<name>.output | CLI output format (e.g. text, json). |
catalog.<name>.rest.sigv4-enabled | Enable AWS SigV4 signing for REST. |
catalog.<name>.rest.signing-name | SigV4 service name. |
catalog.<name>.rest.signing-region | SigV4 region. |
Catalog options
Each catalog package exposes its own functional Option set. The lists below reflect the public option surface; pkg.go.dev is authoritative for the current set.
REST (catalog/rest)
The most option-rich surface. Source: catalog/rest/options.go.
| Group | Options |
|---|---|
| Authentication | WithCredential, WithOAuthToken, WithAuthManager, WithAuthURI, WithScope, WithAudience, WithResource |
| AWS SigV4 | WithSigV4, WithSigV4RegionSvc, WithAwsConfig |
| HTTP | WithHeaders, WithTLSConfig, WithOAuthTLSConfig, WithCustomTransport |
| Catalog routing | WithPrefix, WithWarehouseLocation, WithMetadataLocation |
| Pass-through | WithAdditionalProps |
Hive (catalog/hive)
Source: catalog/hive/options.go.
WithURI(uri string)- Thrift URI for the Hive Metastore (e.g.thrift://127.0.0.1:9083).WithWarehouse(warehouse string)WithProperties(props iceberg.Properties)
Glue (catalog/glue)
Source: catalog/glue/options.go.
WithAwsConfig(cfg aws.Config)- AWS SDK v2 config; respects the AWS default credential chain.WithAwsProperties(props AwsProperties)- explicit overrides for region/endpoint/access keys.
SQL (catalog/sql)
The SQL catalog has no functional-option surface. Construct it with NewCatalog:
db, _ := sql.Open(sqliteshim.ShimName, "file:catalog.db")
cat, err := sqlcat.NewCatalog("default", db, sqlcat.SQLite, iceberg.Properties{
"warehouse": "file:///tmp/warehouse",
})
Supported dialects: sqlcat.Postgres, sqlcat.MySQL, sqlcat.SQLite, sqlcat.MSSQL, sqlcat.Oracle (catalog/sql/sql.go:50).
Shared options on the base catalog package
Operations that create or update tables/views accept these (catalog/catalog.go):
WithLocation,WithPartitionSpec,WithSortOrder,WithProperties,WithStagedUpdates- View-specific:
WithViewLocation,WithViewProperties
File-system credentials
iceberg-go registers the local file system (file://) automatically. Cloud schemes are not registered until you add a blank import:
import _ "github.com/apache/iceberg-go/io/gocloud"
The init() function in io/gocloud/register.go registers s3, s3a, s3n, oss, gs, abfs, abfss, wasb, and wasbs. Without the blank import, these schemes return ErrIOSchemeNotFound with a hint to add the import.
All credential and tuning property keys are constants in io/config.go. They can be supplied through table properties, catalog properties, or per-call iceberg.Properties arguments depending on context.
S3
Authentication is resolved in this order (io/gocloud/s3.go):
- Static credentials in properties:
s3.access-key-id+s3.secret-access-key(+ optionals3.session-token). - The standard AWS SDK v2 default credential chain - environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN),~/.aws/credentials, container/IAM role.
Tuning properties:
| Key (constant) | Purpose |
|---|---|
s3.region (io.S3Region) / client.region (io.S3ClientRegion) | AWS region. |
s3.endpoint (io.S3EndpointURL) | Override S3 endpoint URL (custom or compatible storage). Falls back to AWS_S3_ENDPOINT env. |
s3.access-key-id (io.S3AccessKeyID) | Static access key. |
s3.secret-access-key (io.S3SecretAccessKey) | Static secret key. |
s3.session-token (io.S3SessionToken) | Static session token. |
s3.proxy-uri (io.S3ProxyURI) | HTTP proxy URL. |
s3.connect-timeout (io.S3ConnectTimeout) | Either a number of seconds ("60", "60.0") or a Go duration ("5s"). |
s3.force-virtual-addressing (io.S3ForceVirtualAddressing) | Force virtual-host-style addressing. |
s3.signer.uri (io.S3SignerURI) | Reserved for remote-signing endpoint (not yet implemented). |
Google Cloud Storage
Authentication resolution (io/gocloud/gcs.go):
- Explicit JSON key bytes via
gcs.jsonkeyor path viagcs.keypath. - Optional
gcs.credtypeselecting one ofservice_account,authorized_user,impersonated_service_account,external_account. - The GCP default credentials chain (
gcp.DefaultCredentials) - falls back to anonymous if no creds are found.
Tuning properties:
| Key (constant) | Purpose |
|---|---|
gcs.endpoint (io.GCSEndpoint) | Custom GCS endpoint URL. |
gcs.keypath (io.GCSKeyPath) | Path to a JSON service-account key file. |
gcs.jsonkey (io.GCSJSONKey) | JSON key as a string. |
gcs.credtype (io.GCSCredType) | Credential type override. |
gcs.usejsonapi (io.GCSUseJSONAPI) | Set to any value to enable the GCS JSON API for reads. |
Azure Data Lake Storage / Blob
Authentication is selected based on the property keys present (io/gocloud/azure.go):
- Shared key: both
adls.auth.shared-key.account.nameandadls.auth.shared-key.account.keyset. - Per-host SAS token:
adls.sas-token.<hostname>(prefix-matched against the storage account host). - Per-host connection string:
adls.connection-string.<hostname>. - Managed identity:
adls.auth.managed-identity.enabledset to a truthy value.
Tuning properties:
| Key (constant) | Purpose |
|---|---|
adls.auth.shared-key.account.name (io.ADLSSharedKeyAccountName) | Account name. |
adls.auth.shared-key.account.key (io.ADLSSharedKeyAccountKey) | Account key. |
adls.sas-token.<host> (prefix io.ADLSSasTokenPrefix) | Per-host SAS token. |
adls.connection-string.<host> (prefix io.ADLSConnectionStringPrefix) | Per-host connection string. |
adls.client-id (io.ADLSClientID) | Client/application ID for AAD auth. |
adls.endpoint (io.ADLSEndpoint) | Storage domain (e.g. blob.core.windows.net). |
adls.protocol (io.ADLSProtocol) | http or https. |
adls.auth.managed-identity.enabled (io.ADLSManagedIdentityEnabled) | Enable Azure Managed Identity auth. |
Environment variables
iceberg-go reads only a small set of environment variables directly. AWS / GCP / Azure credentials flow through the respective SDKs, not through iceberg-go-defined env vars.
| Variable | Purpose | Read at |
|---|---|---|
GOICEBERG_HOME | Directory containing .iceberg-go.yaml. Defaults to the user's home directory. | config/config.go:87 |
ICEBERG_SQL_DEBUG | SQL catalog query logging - 1 (failed queries), 2 (all queries). | catalog/sql/sql.go:206 |
AWS_S3_ENDPOINT | Fallback S3 endpoint when s3.endpoint is unset. | io/gocloud/s3.go:193 |
There is no PYICEBERG_*-style env var convention. Use the YAML config file or pass iceberg.Properties to overrides programmatically.
Concurrency
| Setting | Source | Effect |
|---|---|---|
max-workers in ~/.iceberg-go.yaml (config.EnvConfig.MaxWorkers) | YAML config | Worker pool size used by parallel column writes, snapshot producers, scan plan, equality-delete writers. Default 5. |
WitMaxConcurrency(n int) ScanOption | Code (table.WitMaxConcurrency) | Per-scan override. Note: function name is Wit... (not With...) - this is a pre-existing typo in the public API. |
WithMaxWriteWorkers(n int) | Code (per-write API on WriteRecords) | Per-write override of the worker count. |
WithClusteredWrite() | Code (per-write API on WriteRecords) | Forces single-threaded writes. Mutually exclusive with WithMaxWriteWorkers. |
Pluggability
Two registries are user-extensible. The third (LocationProvider) is currently informational.
IO scheme registry
Register a custom URL scheme with io.Register:
import (
"context"
"net/url"
"github.com/apache/iceberg-go/io"
)
func init() {
io.Register("myfs", func(ctx context.Context, parsed *url.URL, props map[string]string) (io.IO, error) {
return newMyFS(parsed, props)
})
}
io.Register panics on nil factory or duplicate scheme. Built-in schemes: file, "" (the empty scheme). Cloud schemes (s3, gs, abfs, etc.) are registered by io/gocloud only when its package is blank-imported.
io.GetRegisteredSchemes() returns the current scheme list; io.Unregister(scheme) removes one.
Catalog type registry
Register a custom catalog type with catalog.Register:
import (
"context"
"github.com/apache/iceberg-go"
"github.com/apache/iceberg-go/catalog"
)
func init() {
catalog.Register("mycatalog", catalog.RegistrarFunc(
func(ctx context.Context, name string, props iceberg.Properties) (catalog.Catalog, error) {
return newMyCatalog(name, props)
},
))
}
After registration, catalog.Load(ctx, "default", iceberg.Properties{"type": "mycatalog", ...}) will route to the factory. Built-in types: rest, hive, glue, sql, hadoop.
catalog.GetRegisteredCatalogs() returns the current list; catalog.Unregister(catalogType) removes one.
LocationProvider
table/locations.go defines the LocationProvider interface and ships two implementations: simpleLocationProvider (default) and objectStoreLocationProvider (selected by write.object-storage.enabled = true). The provider is chosen by table properties and is not user-pluggable today.
Table write properties
Property key constants are in table/properties.go and Parquet keys in table/internal/parquet_files.go. The keys below have verified read sites in non-test code.
Format and file sizing
| Key | Default | Description |
|---|---|---|
write.format.default | parquet | File format used when writing data files. Read in table/writer.go and table/rolling_data_writer.go. |
write.target-file-size-bytes | (set by writer) | Target size for newly written data files. Read in table/arrow_utils.go and table/equality_delete_writer.go. |
Metrics and metadata lifecycle
| Key | Description |
|---|---|
write.metadata.metrics.default | Default per-column metrics mode. |
write.metadata.metrics.column.<name> | Per-column override prefix. |
write.metadata.delete-after-commit.enabled | When true, expire old metadata files after a successful commit. |
write.metadata.previous-versions-max | Cap on retained metadata files. Default 100. |
write.metadata.compression-codec | Compression for metadata JSON. |
Manifest and commit
| Key | Description |
|---|---|
commit.manifest-merge.enabled | Merge small manifests during commit. |
commit.manifest.target-size-bytes | Target size for merged manifests. |
commit.manifest.min-count-to-merge | Minimum manifest count that triggers a merge. |
commit.retry.num-retries | Retries for ErrCommitFailed. |
commit.retry.min-wait-ms / commit.retry.max-wait-ms / commit.retry.total-timeout-ms | Backoff bounds. |
Snapshot retention
| Key | Description |
|---|---|
min-snapshots-to-keep | Minimum snapshots to retain when expiring. |
max-snapshot-age-ms | Maximum age of retained snapshots. |
max-ref-age-ms | Maximum age of branch/tag refs that are not the main branch. |
gc.enabled | Gate for orphan-file cleanup. |
Delete mode
| Key | Description |
|---|---|
write.delete.mode | Delete strategy used by row-level delete writers. |
Object-store data layout
| Key | Description |
|---|---|
write.data.path | Override data file directory. |
write.metadata.path | Override metadata file directory. |
write.object-storage.enabled | Switch the location provider to the hashed object-storage layout. |
write.object-storage.partitioned-paths | Whether partition values are included in object-storage paths. |
Parquet writer
All defined in table/internal/parquet_files.go and read by the Parquet writer:
| Key | Default |
|---|---|
write.parquet.row-group-size-bytes | 128 MB |
write.parquet.row-group-limit | 1,048,576 rows |
write.parquet.page-size-bytes | 1 MB |
write.parquet.page-row-limit | 20,000 rows |
write.parquet.dict-size-bytes | 2 MB |
write.parquet.page-version | 2 |
write.parquet.compression-codec | zstd |
write.parquet.compression-level | -1 (codec default) |
write.parquet.bloom-filter-max-bytes | 1 MB |
write.parquet.bloom-filter-enabled.column.<name> | (per-column toggle, prefix-matched) |
Parquet reader
| Key | Description |
|---|---|
read.parquet.batch-size | Arrow record-batch size used by the Parquet reader. |
CLI
Run go build ./cmd/iceberg from the root of this repository to build the CLI executable, alternately you can run go install github.com/apache/iceberg-go/cmd/iceberg to install it to the bin directory of your GOPATH.
The iceberg CLI usage is very similar to pyiceberg CLI.
You can pass the catalog URI with the --uri argument.
Connecting to a catalog
Start a local REST catalog (the default --catalog type):
docker pull apache/iceberg-rest-fixture:latest
docker run -p 8181:8181 apache/iceberg-rest-fixture:latest
and run the iceberg CLI pointing to the REST API server:
./iceberg --uri http://0.0.0.0:8181 list
┌─────┐
| IDs |
| --- |
└─────┘
Catalog connection flags are global and apply to every subcommand:
| Flag | Description |
|---|---|
--catalog | Catalog type: rest (default), glue, hive, hadoop |
--uri | Catalog URI (REST/Hive) |
--warehouse | Warehouse location |
--credential | Credentials for the catalog |
--token | OAuth token (skips OAuth flow) |
--scope | OAuth scope (default catalog) |
--catalog-name | Catalog name to load from config file (default default) |
--config | Path to a config file |
To avoid passing flags every time, define a config file at ~/.iceberg-go.yaml:
default-catalog: default
catalog:
default:
type: rest
uri: http://localhost:8181
warehouse: s3://my-warehouse
Flags on the command line override values from the config file.
Output format
All commands accept --output text (default, human-readable) or --output json (machine-readable, suitable for piping into jq or scripts).
Catalog and namespace commands
Create namespace
./iceberg --uri http://0.0.0.0:8181 create namespace taxitrips
List namespaces
./iceberg --uri http://0.0.0.0:8181 list
┌───────────┐
| IDs |
| --------- |
| taxitrips |
└───────────┘
Create table
Note: only the identity transform is supported for --partition-spec at this moment.
# Create a simple table with REST catalog and Minio
./iceberg create table default.table-1 \
--properties write.format.default=parquet \
--partition-spec foo \
--sort-order foo:desc:nulls-last \
--schema '[{"id":1,"name":"foo","type":"string","required":false},{"id":2,"name":"bar","type":"int","required":true}]' \
--catalog rest \
--uri http://localhost:8181
Table default.table-1 created successfully
# Describe the newly created table
./iceberg describe --catalog rest --uri http://localhost:8181 default.table-1
Table format version | 2
Metadata location | s3://warehouse/default/table-1/metadata/00000-f0ccaadd-d988-482e-99da-3a37870288fe.metadata.json
Table UUID | 33fa3fac-e638-4335-a085-343c6d9e7de5
Last updated | 1753133512562
Sort Order | 1: [
| 1 desc nulls-last
| ]
Partition Spec | [
| 1000: foo: identity(1)
| ]
Current Schema, id=0
├──1: foo: optional string
└──2: bar: required int
Current Snapshot |
Snapshots
Properties
key | value
-----------------------------------------
write.format.default | parquet
write.parquet.compression-codec | zstd
Inspecting tables
info — single-screen summary
iceberg info my_db.events
Reports format version, location, current snapshot, schema, partition spec, sort order, snapshot count, ref count, and property count for a table.
snapshots — snapshot history
iceberg snapshots my_db.events
Lists all snapshots with timestamp, parent snapshot, operation (append, overwrite, delete, replace), and added/deleted data file counts.
refs — branches and tags
iceberg refs my_db.events
iceberg refs --type branch my_db.events
iceberg refs --type tag my_db.events
Lists snapshot refs along with their retention settings (max-ref-age, max-snapshot-age, min-snapshots-to-keep).
partition-stats — partition statistics files
iceberg partition-stats my_db.events # current snapshot
iceberg partition-stats --snapshot-id 7234981023498 my_db.events
iceberg partition-stats --all my_db.events # all snapshots
schema --show-defaults
iceberg schema --show-defaults my_db.events
Prints the schema and surfaces each field's initial-default and write-default — useful when debugging schema-evolution behavior.
Snapshot maintenance
expire-snapshots
Drop old snapshots so their unreferenced data files become eligible for cleanup.
| Flag | Description |
|---|---|
--older-than DURATION | Expire snapshots older than the given duration (7d, 168h) |
--retain-last N | Always keep at least N snapshots, regardless of age |
--dry-run | List what would be expired without committing |
--yes | Skip the confirmation prompt |
# Preview
iceberg expire-snapshots --older-than 7d --dry-run my_db.events
# Commit, retaining at least 5 snapshots
iceberg expire-snapshots --older-than 7d --retain-last 5 my_db.events
clean-orphan-files
Remove data files in the table location that are not referenced by any snapshot's manifests (e.g. left behind by failed writes).
| Flag | Description |
|---|---|
--older-than DURATION | Only consider files older than this (default 72h, gives in-flight writes time to finish) |
--location PATH | Scan a different directory (e.g. an old warehouse path after migration) |
--dry-run | List orphan files without deleting |
--yes | Skip the confirmation prompt |
iceberg clean-orphan-files --dry-run my_db.events
iceberg clean-orphan-files --older-than 5d my_db.events
iceberg clean-orphan-files --location s3://old-warehouse/my_db/events my_db.events
Rollback
Reset the current snapshot pointer to a previous snapshot. The target must be an ancestor of the current snapshot.
| Flag | Description |
|---|---|
--snapshot-id ID | Snapshot to roll back to (required) |
--yes | Skip the confirmation prompt |
iceberg rollback --snapshot-id 6891234567890 my_db.events
Format upgrade
Upgrade the table format version (metadata-only operation; no data files are rewritten). Refuses downgrades and same-version "upgrades".
| Flag | Description |
|---|---|
--dry-run | Show what would change without committing |
--yes | Skip the confirmation prompt |
iceberg upgrade --dry-run my_db.events 2
iceberg upgrade my_db.events 2
Branches and tags
branch create
iceberg branch create my_db.events ml-experiment-v3
| Flag | Description |
|---|---|
--snapshot-id ID | Snapshot the branch points at (default: current snapshot) |
--max-ref-age DURATION | Branch itself expires after this age |
--max-snapshot-age DURATION | Snapshots on the branch older than this can be expired |
--min-snapshots-to-keep N | Always retain at least N snapshots on the branch |
--yes | Skip the confirmation prompt |
iceberg branch create \
--snapshot-id 7234981023498 \
--max-ref-age 30d \
--max-snapshot-age 7d \
--min-snapshots-to-keep 10 \
my_db.events audit-2026-q2
tag create
iceberg tag create my_db.events pre-migration-v4
| Flag | Description |
|---|---|
--snapshot-id ID | Snapshot the tag points at (default: current snapshot) |
--max-ref-age DURATION | Tag is auto-cleaned after this age |
--yes | Skip the confirmation prompt |
iceberg tag create \
--snapshot-id 7234981023498 \
--max-ref-age 90d \
my_db.events monthly-backup-may
Automation
Two flags make these commands safe to run from cron jobs or CI:
--yesskips the interactive prompt. Without it in a non-interactive environment, the CLI exits withstdin is not a terminal: use --yes to confirm in non-interactive moderather than hanging.--output jsonemits structured output that can be consumed byjqand downstream tooling.
Daily maintenance over every table in a namespace:
#!/bin/bash
TABLES=$(iceberg list my_db --output json | jq -r '.identifiers[].name')
for table in $TABLES; do
iceberg expire-snapshots --older-than 7d --retain-last 3 --yes \
--output json "my_db.$table"
iceberg clean-orphan-files --older-than 3d --yes \
--output json "my_db.$table"
done
Tag tables before a deploy:
iceberg tag create --yes my_db.events "pre-deploy-$VERSION"
iceberg tag create --yes my_db.users "pre-deploy-$VERSION"
Audit-only report (no commits):
iceberg expire-snapshots --older-than 7d --dry-run --output json my_db.events \
| jq '{table, would_expire: .expired_snapshot_count}'
Safety features
Every write command has multiple layers of protection:
--dry-run— shows the would-be effect without committing. Look for[DRY RUN]in text output or"dry_run": truein JSON.--yes— required to skip the prompt; without it, non-interactive shells get an explicit error rather than hanging.- TTY detection — interactive prompts are only shown when stdout is a terminal.
- Ancestor validation —
rollbackrejects target snapshots that are not in the current branch's history. - Version check —
upgraderefuses same-version or downgrade requests.
API
The Go API surface for Apache Iceberg Go. New to the project? Walk through the Getting Started tutorial first - the recipes here assume you already have a catalog and a table.
For configuration knobs (catalog options, FileIO credentials, table properties), see Configuration. For predicate construction details, see Row Filter Syntax and Expression DSL.
Catalog
catalog.Catalog is the entry point for everything: namespaces, tables, views.
Constructing a catalog
REST
import (
"context"
"github.com/apache/iceberg-go/catalog/rest"
)
cat, err := rest.NewCatalog(context.Background(), "rest", "http://localhost:8181",
rest.WithOAuthToken("your-token"))
SQL (SQLite, Postgres, MySQL, Oracle, MSSQL)
import (
"database/sql"
"github.com/apache/iceberg-go"
sqlcat "github.com/apache/iceberg-go/catalog/sql"
"github.com/uptrace/bun/driver/sqliteshim"
)
db, err := sql.Open(sqliteshim.ShimName, "file:catalog.db")
// handle err
cat, err := sqlcat.NewCatalog("default", db, sqlcat.SQLite, iceberg.Properties{
"warehouse": "file:///tmp/warehouse",
})
Glue
import (
"github.com/apache/iceberg-go/catalog/glue"
"github.com/aws/aws-sdk-go-v2/config"
)
awsCfg, err := config.LoadDefaultConfig(context.TODO())
// handle err
cat := glue.NewCatalog(glue.WithAwsConfig(awsCfg))
Hive
import (
"github.com/apache/iceberg-go"
"github.com/apache/iceberg-go/catalog/hive"
)
cat, err := hive.NewCatalog(iceberg.Properties{},
hive.WithURI("thrift://localhost:9083"),
hive.WithWarehouse("s3://my-bucket/warehouse"))
Hadoop
import (
"github.com/apache/iceberg-go"
"github.com/apache/iceberg-go/catalog/hadoop"
)
cat, err := hadoop.NewCatalog("default", "file:///tmp/warehouse", iceberg.Properties{})
Via the registry
catalog.Load looks up the right backend via the type property (or uri scheme as a fallback) plus your ~/.iceberg-go.yaml. Useful when you want runtime selection:
import (
"github.com/apache/iceberg-go"
"github.com/apache/iceberg-go/catalog"
)
cat, err := catalog.Load(ctx, "default", iceberg.Properties{
"type": "rest",
"uri": "http://localhost:8181",
"warehouse": "s3://my-bucket/warehouse",
})
Namespaces
ns := table.Identifier{"sales"}
err := cat.CreateNamespace(ctx, ns, iceberg.Properties{"owner": "data-team"})
namespaces, err := cat.ListNamespaces(ctx, nil) // []table.Identifier
exists, err := cat.CheckNamespaceExists(ctx, ns)
props, err := cat.LoadNamespaceProperties(ctx, ns)
summary, err := cat.UpdateNamespaceProperties(ctx, ns,
[]string{"deprecated"}, // removals
iceberg.Properties{"owner": "platform-team"}, // updates
)
err = cat.DropNamespace(ctx, ns)
table.Identifier is []string; use catalog.ToIdentifier("sales", "orders") (or catalog.ToIdentifier("sales.orders")) to build one from string parts.
Tables
Defining a schema
import "github.com/apache/iceberg-go"
schema := iceberg.NewSchema(1,
iceberg.NestedField{ID: 1, Name: "id", Type: iceberg.PrimitiveTypes.Int64, Required: true},
iceberg.NestedField{ID: 2, Name: "name", Type: iceberg.PrimitiveTypes.String, Required: false},
iceberg.NestedField{ID: 3, Name: "active", Type: iceberg.PrimitiveTypes.Bool, Required: false},
)
For nested types use &iceberg.StructType{...}, &iceberg.ListType{...}, or &iceberg.MapType{...}. Use NewSchemaWithIdentifiers(id, identifierIDs, fields...) to mark identifier columns.
Create
import "github.com/apache/iceberg-go/catalog"
ident := catalog.ToIdentifier("sales", "orders")
tbl, err := cat.CreateTable(ctx, ident, schema,
catalog.WithLocation("s3://my-bucket/sales/orders"),
catalog.WithProperties(iceberg.Properties{"owner": "data-team"}),
)
Optional catalog.WithPartitionSpec, catalog.WithSortOrder, and catalog.WithStagedUpdates are also available.
Load, exists, list, drop, rename
tbl, err := cat.LoadTable(ctx, ident)
exists, err := cat.CheckTableExists(ctx, ident)
for ident, err := range cat.ListTables(ctx, table.Identifier{"sales"}) {
if err != nil { /* ... */ }
fmt.Println(ident)
}
err = cat.DropTable(ctx, ident)
renamed, err := cat.RenameTable(ctx,
catalog.ToIdentifier("sales", "orders"),
catalog.ToIdentifier("sales", "orders_v2"))
ListTables returns an iter.Seq2[table.Identifier, error] that streams results.
Inspecting metadata
tbl.Identifier() // table.Identifier
tbl.Location() // string
tbl.MetadataLocation() // string (path of current metadata.json)
tbl.Metadata() // table.Metadata
tbl.Schema() // *iceberg.Schema (current)
tbl.Schemas() // map[int]*iceberg.Schema
tbl.Spec() // iceberg.PartitionSpec
tbl.SortOrder() // table.SortOrder
tbl.Properties() // iceberg.Properties
if snap := tbl.CurrentSnapshot(); snap != nil {
fmt.Println(snap.SnapshotID, snap.TimestampMs, snap.Summary)
}
// All snapshots
for _, snap := range tbl.Metadata().Snapshots() {
fmt.Println(snap.SnapshotID)
}
// Stream all manifest files across all snapshots
for mf, err := range tbl.AllManifests(ctx) {
if err != nil { /* ... */ }
fmt.Println(mf.FilePath())
}
Reading data
(t Table) Scan(opts ...ScanOption) *Scan returns a scan that you can resolve into Arrow data.
Streaming record batches
import "github.com/apache/iceberg-go/table"
scan := tbl.Scan()
arrowSchema, batches, err := scan.ToArrowRecords(ctx)
if err != nil { /* ... */ }
fmt.Println(arrowSchema)
for batch, err := range batches {
if err != nil { /* ... */ }
fmt.Printf("batch with %d rows\n", batch.NumRows())
batch.Release()
}
ToArrowRecords returns iter.Seq2[arrow.RecordBatch, error] so only one batch is in memory at a time. Always call batch.Release() to free Arrow buffers.
Materializing as an Arrow Table
arrowTbl, err := tbl.Scan().ToArrowTable(ctx)
if err != nil { /* ... */ }
defer arrowTbl.Release()
fmt.Printf("%d rows in %d cols\n", arrowTbl.NumRows(), arrowTbl.NumCols())
Projection and filters
import "github.com/apache/iceberg-go"
scan := tbl.Scan(
table.WithSelectedFields("id", "name"),
table.WithRowFilter(
iceberg.NewAnd(
iceberg.GreaterThanEqual(iceberg.Reference("id"), int64(100)),
iceberg.IsIn(iceberg.Reference("region"), "us-east", "us-west"),
),
),
table.WithLimit(1000),
table.WithCaseSensitive(true),
)
For the predicate vocabulary, see Row Filter Syntax.
Time travel
// By snapshot ID
scan := tbl.Scan(table.WithSnapshotID(snap.SnapshotID))
// As of a timestamp (milliseconds since epoch)
scan = tbl.Scan(table.WithSnapshotAsOf(time.Now().Add(-24*time.Hour).UnixMilli()))
Reading from a branch or tag
scan, err := tbl.Scan().UseRef("audit-branch")
if err != nil { /* ... */ }
arrowTbl, err := scan.ToArrowTable(ctx)
Iterating tasks for custom processing
If you need finer control (custom file readers, distributed scan planning):
scan := tbl.Scan(table.WithRowFilter(myFilter))
tasks, err := scan.PlanFiles(ctx)
if err != nil { /* ... */ }
arrowSchema, batches, err := scan.ReadTasks(ctx, tasks)
Writing data
The shortcut methods on Table open a transaction, perform the write, and commit. Use NewTransaction directly when you need to combine multiple operations.
Append
import (
"github.com/apache/arrow-go/v18/arrow/array"
)
// From a streaming RecordReader
newTbl, err := tbl.Append(ctx, recordReader, nil /* snapshot props */)
// From an in-memory Arrow Table; batchSize controls the rolling writer
newTbl, err = tbl.AppendTable(ctx, arrowTbl, 1024, nil)
Overwrite
import "github.com/apache/iceberg-go/table"
// Replace all data
newTbl, err := tbl.Overwrite(ctx, recordReader, nil)
// Replace only rows matching a filter
newTbl, err = tbl.Overwrite(ctx, recordReader, nil,
table.WithOverwriteFilter(
iceberg.EqualTo(iceberg.Reference("date"), "2026-01-01"),
),
)
OverwriteTable is the arrow.Table variant.
Delete
newTbl, err := tbl.Delete(ctx,
iceberg.LessThan(iceberg.Reference("id"), int64(100)),
nil, /* snapshot props */
)
Add existing files
When you already have data files (e.g. produced by another writer), register them without rewriting:
txn := tbl.NewTransaction()
err := txn.AddFiles(ctx, []string{
"s3://my-bucket/sales/orders/data/file-1.parquet",
"s3://my-bucket/sales/orders/data/file-2.parquet",
}, nil /* snapshot props */, false /* ignoreDuplicates */)
if err != nil { /* ... */ }
newTbl, err := txn.Commit(ctx)
ReplaceDataFiles(ctx, filesToDelete, filesToAdd, snapshotProps) and ReplaceDataFilesWithDataFiles(ctx, filesToDelete, dataFilesToAdd, snapshotProps, opts...) are also available on *Transaction for swapping files atomically.
Transactions
Group writes and metadata changes into one atomic snapshot:
txn := tbl.NewTransaction()
if err := txn.Delete(ctx,
iceberg.LessThan(iceberg.Reference("date"), "2026-01-01"), nil); err != nil {
/* ... */
}
if err := txn.Append(ctx, recordReader, nil); err != nil {
/* ... */
}
if err := txn.SetProperties(iceberg.Properties{"commit.user": "data-pipeline"}); err != nil {
/* ... */
}
newTbl, err := txn.Commit(ctx)
To target a specific branch:
txn := tbl.NewTransactionOnBranch("staging")
Commit retries automatically on conflict (ErrCommitFailed) - tune via the commit.retry.* table properties.
Schema and partition evolution
Schema evolution
import "github.com/apache/iceberg-go/table"
txn := tbl.NewTransaction()
err := table.NewUpdateSchema(txn, true /* caseSensitive */, false /* allowIncompatibleChanges */).
AddColumn([]string{"tip"}, iceberg.PrimitiveTypes.Float64, "Tip in dollars", false, nil).
RenameColumn([]string{"name"}, "full_name").
DeleteColumn([]string{"deprecated_field"}).
Commit()
if err != nil { /* ... */ }
newTbl, err := txn.Commit(ctx)
Reorder fields with MoveFirst, MoveBefore, or MoveAfter. Set allowIncompatibleChanges to true to permit type narrowing or making optional columns required.
Partition evolution
us := table.NewUpdateSpec(txn, true /* caseSensitive */)
us.AddField("event_time", iceberg.DayTransform{}, "event_day") // sourceColName, transform, partitionFieldName
us.RemoveField("legacy_partition")
if err := us.Commit(); err != nil { /* ... */ }
AddField chains; AddIdentity(sourceCol) is a shortcut for an identity transform; RenameField(name, newName) renames an existing partition field.
Available transforms (root iceberg package): IdentityTransform{}, YearTransform{}, MonthTransform{}, DayTransform{}, HourTransform{}, BucketTransform{NumBuckets: N}, TruncateTransform{Width: W}.
Snapshots and refs
Inspecting
if snap := tbl.CurrentSnapshot(); snap != nil {
fmt.Println(snap.SnapshotID, snap.TimestampMs, snap.Summary)
}
snap := tbl.SnapshotByID(snapshotID)
named := tbl.SnapshotByName("audit")
for _, s := range tbl.Metadata().Snapshots() {
fmt.Println(s.SnapshotID, s.Summary)
}
Branches and tags
The CLI's branch create and tag create commands (CLI) are the most ergonomic surface today. Programmatically, ref creation goes through Catalog.CommitTable with a SetSnapshotRef update:
import "github.com/apache/iceberg-go/table"
snap := tbl.CurrentSnapshot()
update := table.NewSetSnapshotRefUpdate(
"audit", // ref name
snap.SnapshotID,
table.BranchRef, // or table.TagRef
0, // maxRefAgeMs (0 = unset)
0, // maxSnapshotAgeMs (0 = unset)
0, // minSnapshotsToKeep (0 = unset)
)
reqs := []table.Requirement{
table.AssertTableUUID(tbl.Metadata().TableUUID()),
table.AssertRefSnapshotID("audit", nil), // ref must not already exist
}
_, _, err := cat.CommitTable(ctx, tbl.Identifier(), reqs, []table.Update{update})
Constants table.MainBranch, table.BranchRef, table.TagRef live in table/refs.go. A higher-level builder is on the roadmap.
Expiration and rollback
// Expire snapshots older than the table's retention properties
err := txn.ExpireSnapshots(/* options */)
// Roll back to a previous snapshot
err = txn.RollbackToSnapshot(targetSnapshotID)
Tune retention with the min-snapshots-to-keep, max-snapshot-age-ms, and max-ref-age-ms table properties (see Configuration).
Maintenance
Orphan file cleanup
import (
"time"
"github.com/apache/iceberg-go/table"
)
result, err := tbl.DeleteOrphanFiles(ctx,
table.WithFilesOlderThan(72*time.Hour),
table.WithDryRun(false),
table.WithMaxConcurrency(8),
)
if err != nil { /* ... */ }
fmt.Printf("removed %d files\n", len(result.DeletedFiles))
Also see table.WithLocation, table.WithDeleteFunc, table.WithPrefixMismatchMode, table.WithEqualSchemes, and table.WithEqualAuthorities in table/orphan_cleanup.go.
Compaction (rewrite data files)
import "github.com/apache/iceberg-go/table"
txn := tbl.NewTransaction()
result, err := txn.RewriteDataFiles(ctx, groups /* []table.CompactionTaskGroup */, table.RewriteDataFilesOptions{})
if err != nil { /* ... */ }
newTbl, err := txn.Commit(ctx)
fmt.Printf("rewrote %d files into %d (%d -> %d bytes)\n",
result.RemovedDataFiles, result.AddedDataFiles, result.BytesBefore, result.BytesAfter)
The table/compaction subpackage provides bin-packing planning. The iceberg compact analyze and compact run CLI commands wrap the same machinery - see CLI.
Expiring snapshots
import (
"time"
"github.com/apache/iceberg-go/table"
)
txn := tbl.NewTransaction()
err := txn.ExpireSnapshots(
table.WithOlderThan(7*24*time.Hour),
table.WithRetainLast(10),
)
if err != nil { /* ... */ }
newTbl, err := txn.Commit(ctx)
Pass table.WithPostCommit(true) to delete the unreferenced data and metadata files after the commit lands. The iceberg expire-snapshots CLI command wraps the same operation - see CLI.
Views
Views are created and loaded through catalogs that support them (REST, Hive, SQL):
import "github.com/apache/iceberg-go/view"
// Create
v, err := view.CreateView(
ctx,
"my-catalog",
table.Identifier{"analytics", "monthly_orders"},
schema,
"SELECT month, sum(amount) FROM orders GROUP BY month",
table.Identifier{"sales"}, // default namespace for unqualified names
"s3://my-bucket/views/monthly_orders",
iceberg.Properties{},
)
// Inspect
v.CurrentVersion() // *view.Version
v.CurrentSchema() // *iceberg.Schema
v.Versions() // []*view.Version
v.Schemas() // map[int]*iceberg.Schema
v.Properties() // iceberg.Properties
view.New(ident, meta, metadataLocation) constructs a view from already-loaded metadata; view.NewFromLocation(ctx, ident, metadataLocation, fsysFactory) loads metadata from disk or object storage.
Iceberg ↔ Arrow types
When iceberg-go converts an Iceberg schema to Arrow (e.g. for the scanner output) or vice versa, the type mapping is:
| Iceberg type | Arrow type |
|---|---|
boolean | arrow.FixedWidthTypes.Boolean |
int | arrow.PrimitiveTypes.Int32 |
long | arrow.PrimitiveTypes.Int64 |
float | arrow.PrimitiveTypes.Float32 |
double | arrow.PrimitiveTypes.Float64 |
decimal(p, s) | arrow.Decimal128Type{Precision: p, Scale: s} |
date | arrow.FixedWidthTypes.Date32 |
time | arrow.FixedWidthTypes.Time64us |
timestamp | &arrow.TimestampType{Unit: arrow.Microsecond} (no zone) |
timestamptz | arrow.FixedWidthTypes.Timestamp_us (UTC zone) |
timestamp_ns | &arrow.TimestampType{Unit: arrow.Nanosecond} (no zone) |
timestamptz_ns | arrow.FixedWidthTypes.Timestamp_ns (UTC zone) |
string | arrow.BinaryTypes.String |
binary | arrow.BinaryTypes.Binary |
fixed[L] | &arrow.FixedSizeBinaryType{ByteWidth: L} |
uuid | arrow.FixedWidthTypes.UUID (extension type) |
struct<...> | arrow.StructOf(...) |
list<E> | arrow.ListOf(E) (or LargeListOf if useLargeTypes) |
map<K, V> | arrow.MapOf(K, V) |
variant | arrow.ExtensionType for Variant |
Helpers in table/arrow_utils.go:
SchemaToArrowSchema(sc *iceberg.Schema, nameMapping NameMapping, useLargeTypes, includeRowLineage bool) (*arrow.Schema, error)VisitArrowSchema[T](sc *arrow.Schema, visitor ArrowSchemaVisitor[T]) (T, error)
For a writer-side schema (Arrow → Iceberg), the scanner and writers handle conversion automatically as long as your Arrow schema is compatible with the table schema.
Row Filter Syntax
Row filters drive predicate pushdown during scans, partition pruning, and row-level deletes. The DSL lives in the root iceberg package - all you need is the import:
import "github.com/apache/iceberg-go"
A column is referenced with iceberg.Reference("column_name"). Predicate constructors return a BooleanExpression (or an UnboundPredicate, which satisfies BooleanExpression) that can be combined with the boolean combinators in the Expression DSL and passed to APIs like table.WithRowFilter(...).
Equality
iceberg.EqualTo(iceberg.Reference("status"), "active")
iceberg.NotEqualTo(iceberg.Reference("retries"), int32(0))
EqualTo[T] and NotEqualTo[T] are generic over LiteralType (bool, int32, int64, float32, float64, string, []byte, plus a few iceberg-specific types). The value's type must be in that set - bare int literals are not, so use int32(0) / int64(0) explicitly. They wrap LiteralPredicate(OpEQ, ...) and LiteralPredicate(OpNEQ, ...) (predicates.go:83-91).
Comparison
iceberg.LessThan(iceberg.Reference("amount"), 100.0)
iceberg.LessThanEqual(iceberg.Reference("amount"), 100.0)
iceberg.GreaterThan(iceberg.Reference("created_at"), int64(1700000000))
iceberg.GreaterThanEqual(iceberg.Reference("score"), int32(50))
Operators: OpLT, OpLTEQ, OpGT, OpGTEQ (exprs.go:47-50). Constructors at predicates.go:98-124.
Set membership
iceberg.IsIn(iceberg.Reference("region"), "us-east", "us-west", "eu-west")
iceberg.NotIn(iceberg.Reference("status"), "deleted", "archived")
IsIn and NotIn are variadic. They return a BooleanExpression (not UnboundPredicate) because the result can simplify automatically:
- Zero values - reduces to
AlwaysFalse{}(forIsIn) orAlwaysTrue{}(forNotIn). - One value - reduces to
EqualTo/NotEqualTo.
See predicates.go:55-78.
Null checks
iceberg.IsNull(iceberg.Reference("deleted_at"))
iceberg.NotNull(iceberg.Reference("user_id"))
These wrap UnaryPredicate(OpIsNull, ...) and UnaryPredicate(OpNotNull, ...). Both panic if the term is nil (predicates.go:23-32).
NaN checks (float / double columns only)
iceberg.IsNaN(iceberg.Reference("ratio"))
iceberg.NotNaN(iceberg.Reference("ratio"))
Operators OpIsNan and OpNotNan. Use these instead of EqualTo(..., math.NaN()) - NaN is never equal to itself.
String prefix
iceberg.StartsWith(iceberg.Reference("path"), "/var/log/")
iceberg.NotStartsWith(iceberg.Reference("name"), "tmp_")
Operators OpStartsWith and OpNotStartsWith (exprs.go:53-54). The value must be a string.
Constants
iceberg.AlwaysTrue{}
iceberg.AlwaysFalse{}
These satisfy BooleanExpression and short-circuit during expression simplification. Useful as a base case when filters are built dynamically:
filter := iceberg.BooleanExpression(iceberg.AlwaysTrue{})
for _, clause := range userClauses {
filter = iceberg.NewAnd(filter, clause)
}
Operator reference
The full operator set is the Operation enum at exprs.go:34-62:
| Operator | Constant | Convenience builder |
|---|---|---|
< | OpLT | LessThan |
<= | OpLTEQ | LessThanEqual |
> | OpGT | GreaterThan |
>= | OpGTEQ | GreaterThanEqual |
== | OpEQ | EqualTo |
!= | OpNEQ | NotEqualTo |
IS NULL | OpIsNull | IsNull |
IS NOT NULL | OpNotNull | NotNull |
IS NaN | OpIsNan | IsNaN |
IS NOT NaN | OpNotNan | NotNaN |
IN | OpIn | IsIn |
NOT IN | OpNotIn | NotIn |
STARTS WITH | OpStartsWith | StartsWith |
NOT STARTS WITH | OpNotStartsWith | NotStartsWith |
AND / OR / NOT | OpAnd / OpOr / OpNot | NewAnd / NewOr / NewNot (see Expression DSL) |
Putting it together
A typical filter passed to a scan:
filter := iceberg.NewAnd(
iceberg.GreaterThanEqual(iceberg.Reference("event_time"), int64(1700000000)),
iceberg.IsIn(iceberg.Reference("region"), "us-east", "us-west"),
iceberg.NotNull(iceberg.Reference("user_id")),
)
scan := tbl.Scan(table.WithRowFilter(filter))
For boolean combination, term details, and the lower-level escape hatches (UnaryPredicate, LiteralPredicate, SetPredicate), see Expression DSL.
Expression DSL
This page covers the building blocks behind the row-filter shortcuts in Row Filter Syntax: boolean combinators, terms, and the lower-level predicate constructors.
The DSL lives entirely in the root iceberg package (see exprs.go and predicates.go).
Boolean combinators
iceberg.NewAnd(a, b) // a AND b
iceberg.NewAnd(a, b, c, d) // a AND b AND c AND d (variadic)
iceberg.NewOr(a, b) // a OR b
iceberg.NewOr(a, b, c, d) // a OR b OR c OR d
iceberg.NewNot(a) // NOT a
NewAnd and NewOr accept two required arguments plus a variadic tail (exprs.go:226, exprs.go:287). They simplify automatically:
NewAnd(x, AlwaysTrue{})reduces tox.NewAnd(x, AlwaysFalse{})reduces toAlwaysFalse{}.NewOr(x, AlwaysFalse{})reduces tox.NewOr(x, AlwaysTrue{})reduces toAlwaysTrue{}.NewNot(NewNot(x))reduces tox.
Constants
iceberg.AlwaysTrue{}
iceberg.AlwaysFalse{}
Both satisfy BooleanExpression. Use them as the identity element when composing filters dynamically.
Terms
A term is the left-hand side of a predicate. Iceberg-go has two flavors:
Reference("column_name")- an unbound term that names a column (exprs.go:373). Typing happens at bind time, when the expression is matched against a schema. This is what you almost always want.BoundReference- the resolved form, produced byReference.Bind(schema, caseSensitive)(exprs.go:389). You only encounter these when writing custom expression visitors.
The interfaces are:
type Term interface { ... } // shared marker
type UnboundTerm interface { Term; ... } // pre-bind
type BoundTerm interface { Term; Ref() BoundReference; ... }
(exprs.go:317-348)
Predicates
A predicate applies an Operation to one or more terms. BooleanExpression is the shared interface (exprs.go:123).
Convenience builders (recommended)
For all the common shapes, the constructors in predicates.go are the right tool. See Row Filter Syntax for the full list (EqualTo, LessThan, IsIn, IsNull, StartsWith, etc.).
Lower-level escape hatches
When the convenience builders are not enough (custom operators, dynamic operation selection, working with already-typed Literal values), use the predicate constructors directly:
// Unary predicates: IS NULL / NOT NULL / IS NaN / NOT NaN
pred := iceberg.UnaryPredicate(iceberg.OpIsNull, iceberg.Reference("col"))
// Literal predicates: <, <=, >, >=, ==, !=, STARTS WITH, NOT STARTS WITH
lit := iceberg.NewLiteral(int64(42))
pred := iceberg.LiteralPredicate(iceberg.OpEQ, iceberg.Reference("col"), lit)
// Set predicates: IN / NOT IN
lits := []iceberg.Literal{iceberg.NewLiteral("a"), iceberg.NewLiteral("b")}
pred := iceberg.SetPredicate(iceberg.OpIn, iceberg.Reference("col"), lits)
UnaryPredicate lives at exprs.go:534. LiteralPredicate and SetPredicate live in the same file.
Negation
Every BooleanExpression and every Operation knows how to negate itself.
op := iceberg.OpEQ
op.Negate() // -> OpNEQ
(exprs.go:65-98. OpNot, OpAnd, and OpOr panic on direct negation - negate the wrapping expression instead.)
expr := iceberg.EqualTo(iceberg.Reference("status"), "active")
inverted := expr.Negate() // equivalent to NotEqualTo(...)
Binding and evaluation
Most user code stops at constructing the unbound expression - the scan pipeline handles binding, projection, and evaluation internally. If you are writing a custom visitor:
(Reference).Bind(schema, caseSensitive)returns aBoundTerm(exprs.go:389).BoundExpressions exposeRef(),Type(), and (for terms) the underlyingaccessorfor evaluating against aStructLikerow.
For projection, evaluation, and visitor patterns, see visitors.go and the scan internals in table/scanner.go.
When to reach for what
| Goal | Use |
|---|---|
| Filter a scan or row-level delete | Convenience builders + NewAnd/NewOr/NewNot |
| Combine many clauses dynamically | Start from AlwaysTrue{} (for AND) or AlwaysFalse{} (for OR), fold with NewAnd/NewOr |
| Construct a predicate whose operator is chosen at runtime | UnaryPredicate(op, term), LiteralPredicate(op, term, lit), SetPredicate(op, term, lits) |
| Walk an expression tree | A custom BooleanExprVisitor from visitors.go |
For the per-operator cookbook, return to Row Filter Syntax.
Concurrent Writes
When multiple writers commit to the same table, iceberg-go uses optimistic concurrency control: every commit is validated against the table state it was built on. If another writer committed first, the commit is rejected rather than silently clobbering their snapshot.
Conflict detection
Each producer (Append, Overwrite, Delete, RowDelta, RewriteDataFiles)
runs a set of validators before the commit lands. They check that the files the
operation assumed (added, deleted, or filtered) still match the current table
state. A failed validation surfaces as one of:
table.ErrCommitFailed— the base snapshot moved on; the commit can be retried after refreshing (see below).table.ErrCommitDiverged— terminal. The base snapshot is no longer on the branch at all, so a retry cannot reconcile the change.
import (
"context"
"errors"
"github.com/apache/iceberg-go/table"
)
_, err := txn.Commit(context.Background())
switch {
case errors.Is(err, table.ErrCommitDiverged):
// unrecoverable: rebuild the operation from the latest table
case errors.Is(err, table.ErrCommitFailed):
// retriable: refresh and try again (the retry loop does this for you)
}
Isolation levels
Delete and update operations are validated under an isolation level, set per operation through table properties:
| Property | Default | Values |
|---|---|---|
write.delete.isolation-level | serializable | serializable, snapshot |
write.update.isolation-level | serializable | serializable, snapshot |
serializable rejects the commit if any concurrent snapshot added data
matching the operation's filter. snapshot is more permissive — it only
rejects when concurrent deletes touch the same files.
Automatic retry
iceberg-go can refresh the table and replay the operation against the latest snapshot between attempts. On each retry the table metadata is reloaded, the producer's validators re-run against the fresh snapshot, and the commit is re-submitted only if it is still valid.
Retry is off by default (commit.retry.num-retries is 0). Opt in by
setting the retry properties — see Manifest and commit
in the configuration reference for the full list:
// Opt in to retries (e.g. 4 attempts after the first) for a contended table.
_, err := cat.CreateTable(ctx, ident, schema,
catalog.WithProperties(iceberg.Properties{
"commit.retry.num-retries": "4",
}),
)
Catalog support: retry currently engages on the REST catalog, which wraps commit conflicts as
ErrCommitFailed. The Glue, SQL, and Hive catalogs do not yet wrap their conflict errors, so the retry loop will not fire on them.
Feature Status
This page tracks what Apache Iceberg Go currently supports. The matrix is kept in sync with the project README.md; if you spot a discrepancy, file an issue.
Spec format version coverage
Apache Iceberg Go reads and writes table format versions 1, 2, and 3. The maximum supported version is enforced at table/metadata.go (supportedTableFormatVersion = 3). For active tracking, see issues #589 (V3) and #829 (V2 completion).
V1
All V1 features are supported. V1 is the format-version baseline.
V2
| Feature | Status |
|---|---|
| Sequence numbers | Supported |
| Manifest entry status (added / existing / deleted) | Supported |
| Positional deletes | Supported (read + write) |
| Equality deletes | Supported (read + write). Write via Transaction.WriteEqualityDeletes; row-level commits via Transaction.NewRowDelta |
| Partition spec evolution | Supported |
| Sort order enforcement on write | Supported (PR #1157, closes #833) |
ReplaceDataFiles using OpReplace | Pending (#841) |
V3
| Feature | Status |
|---|---|
Nanosecond timestamps (timestamp_ns, timestamptz_ns) | Supported |
Default values (initial-default, write-default) | Supported |
Row lineage (_row_id, _last_updated_sequence_number) | Supported |
| Encryption keys in metadata | Supported |
| Variant type, non-shredded | Supported (PR #932; umbrella #929) |
| Variant type, shredded reader / writer | In progress (#986, #987) |
| Deletion vectors, read | Supported |
| Deletion vectors, write (unpartitioned) | Supported |
| Deletion vectors, write (partitioned) | In progress (#1135, PR #1151) |
| Geometry / Geography types (schema) | Supported |
| Geometry / Geography (transforms, statistics, pruning) | In progress (umbrella #989) |
| Multi-argument transforms | Infrastructure present; no concrete implementations exercised yet |
FileSystem support
| Filesystem Type | Supported |
|---|---|
| S3 | X |
| Google Cloud Storage | X |
| Azure Blob Storage | X |
| Local Filesystem | X |
S3, GCS, and Azure require a blank import: _ "github.com/apache/iceberg-go/io/gocloud". See Configuration.
Metadata operations
| Operation | Supported |
|---|---|
| Get Schema | X |
| Get Snapshots | X |
| Get Sort Orders | X |
| Get Partition Specs | X |
| Get Manifests | X |
| Create New Manifests | X |
| Plan Scan | X |
| Plan Scan for Snapshot | X |
Catalog support
| Operation | REST | Hive | Glue | SQL | Hadoop |
|---|---|---|---|---|---|
| Load Table | X | X | X | X | X |
| List Tables | X | X | X | X | X |
| Create Table | X | X | X | X | X |
| Register Table | X | X | X | ||
| Update Current Snapshot | X | X | X | X | X |
| Create New Snapshot | X | X | X | X | X |
| Rename Table | X | X | X | X | |
| Drop Table | X | X | X | X | X |
| Alter Table | X | X | X | X | X |
| Check Table Exists | X | X | X | X | X |
| Set Table Properties | X | X | X | X | X |
| List Namespaces | X | X | X | X | X |
| Create Namespace | X | X | X | X | X |
| Check Namespace Exists | X | X | X | X | X |
| Drop Namespace | X | X | X | X | X |
| Update Namespace Properties | X | X | X | X | |
| Create View | X | X | X | ||
| Load View | X | X | |||
| List View | X | X | X | ||
| Drop View | X | X | X | ||
| Check View Exists | X | X | X |
A Hadoop catalog is also available - see catalog/hadoop.
Read / write data
Data can be read as an Arrow Table or as a stream of Arrow record batches via iter.Seq2. See API Reference.
Supported write operations
As long as the FileSystem is supported and the Catalog supports altering the table:
| Operation | Supported |
|---|---|
| Append Stream | X |
| Append Data Files | X |
| Rewrite Files | X |
| Rewrite manifests | |
| Overwrite Files | X |
| Copy-On-Write Delete | X |
| Write Pos Delete | X |
| Write Eq Delete | X |
| Row Delta | X |
Contributing
Get in Touch
Picking Up Issues
Before starting work on an issue:
- Check for existing PRs. Search the open pull requests to make sure nobody is already working on it.
- Claim the issue. Leave a comment on the issue (e.g., "I'd like to work on this") and wait for a maintainer to acknowledge before writing code.
- One at a time for new contributors. If you haven't had a PR merged into iceberg-go yet, please work on one issue at a time. Get it reviewed, address feedback, get it merged — then pick up the next one. This helps us give your work the attention it deserves and avoids wasted effort from overlapping contributions.
If two PRs land for the same issue, we will generally keep the one from the contributor who claimed it first.
Submitting a Pull Request
- Reference the issue number in your PR description (e.g., "Fixes #123").
- Keep PRs focused — one issue per PR.
- Run
go test ./...,gofmt, andgolangci-lint runbefore pushing. CI runs all of these too, but catching issues locally saves a round-trip. - All commits must have a
Signed-off-byline (DCO).
Code Review
- Maintainers may request changes. This is normal — it doesn't mean the PR is bad, it means we want to get it right.
- Respond to review comments by pushing new commits (don't force-push over reviewed code).
- If your PR has been waiting for review for more than a few days, ping on Slack.
Development Setup
git clone https://github.com/apache/iceberg-go.git
cd iceberg-go
go build ./...
go test ./...
Integration Tests
Integration tests require Docker and are gated behind a build tag:
docker compose -f internal/recipe/docker-compose.yml up -d rest minio mc --wait
go test -tags integration ./...
Community
Apache Iceberg Go is developed in the open as part of the broader Apache Iceberg project. The fastest ways to ask questions, share work, and follow development are below.
Chat
- #iceberg-go on Apache Iceberg Slack - day-to-day discussion, design questions, help requests.
Mailing list
The Apache Iceberg project uses a single dev mailing list across all language implementations.
- Read / post:
dev@iceberg.apache.org - Subscribe: send any email to
dev-subscribe@iceberg.apache.org
Issues and pull requests
For contribution guidelines, see Contributing.
Apache Iceberg community at large
- Apache Iceberg community page - cross-implementation links, sync calls, and the project Code of Conduct.
- Apache Iceberg Code of Conduct - applies to all participation in this project.
Glossary
This glossary defines important terms used throughout the Iceberg ecosystem, organized in tables for easy reference.
Core Concepts
| Term | Definition |
|---|---|
| Catalog | A centralized service that manages table metadata and provides a unified interface for accessing Iceberg tables. Catalogs can be implemented as Hive metastore, AWS Glue, REST API, or SQL-based solutions. |
| Table | A collection of data files organized by a schema, with metadata tracking changes over time through snapshots. Tables support ACID transactions and schema evolution. |
| Schema | The structure definition of a table, specifying field names, types, and whether fields are required or optional. Schemas are versioned and can evolve over time. |
| Snapshot | A point-in-time view of a table's data, representing the state after a specific operation (append, overwrite, delete, etc.). Each snapshot contains metadata about the operation and references to data files. |
| Manifest | A metadata file that lists data files and their metadata (location, partition information, record counts, etc.). Manifests are organized into manifest lists for efficient access. |
| Manifest List | A file that contains references to manifest files for a specific snapshot, enabling efficient discovery of data files without reading all manifests. |
Data Types
Primitive Types
| Type | Description |
|---|---|
boolean | True/false values |
int (32-bit) | Integer values |
long (64-bit) | Long integer values |
float (32-bit) | Single precision floating point |
double (64-bit) | Double precision floating point |
date | Date values (days since epoch) |
time | Time values (microseconds since midnight) |
timestamp | Timestamp values (microseconds since epoch) |
timestamptz | Timestamp with timezone |
string | UTF-8 encoded strings |
uuid | UUID values |
binary | Variable length binary data |
fixed[n] | Fixed length binary data of n bytes |
decimal(p,s) | Decimal values with precision p and scale s |
Nested Types
| Type | Description |
|---|---|
struct | Collection of named fields |
list | Ordered collection of elements |
map | Key-value pairs |
Operations
| Operation | Description |
|---|---|
| Append | An operation that adds new data files to a table without removing existing data. Creates a new snapshot with the additional files. |
| Overwrite | An operation that replaces existing data files with new ones, typically based on a partition predicate. Creates a new snapshot with the replacement files. |
| Delete | An operation that removes data files from a table, either by marking them as deleted or by removing references to them. |
| Replace | An operation that completely replaces all data in a table with new data, typically used for full table refreshes. |
Partitioning
| Term | Definition |
|---|---|
| Partition | A logical division of table data based on column values, used to improve query performance by allowing selective reading of relevant data files. |
| Partition Spec | Defines how table data is partitioned by specifying source columns and transformations (identity, bucket, truncate, year, month, day, hour). |
| Partition Field | A field in the partition spec that defines how a source column is transformed for partitioning. |
| Partition Path | The file system path structure created by partition values, typically in the format partition_name=value/. |
Partition Transforms
| Transform | Description |
|---|---|
identity | Use the column value directly |
bucket[n] | Hash the value into n buckets |
truncate[n] | Truncate strings to n characters |
year | Extract year from date/timestamp |
month | Extract month from date/timestamp |
day | Extract day from date/timestamp |
hour | Extract hour from timestamp |
void | Always returns null (used for unpartitioned tables) |
Expressions and Predicates
| Term | Definition |
|---|---|
| Expression | A computation or comparison that can be evaluated against table data, used for filtering and transformations. |
| Predicate | A boolean expression used to filter data, such as column comparisons, null checks, or set membership tests. |
| Bound Predicate | A predicate that has been resolved against a specific schema, with field references bound to actual columns. |
| Unbound Predicate | A predicate that contains unresolved field references, typically in string form before binding to a schema. |
| Literal | A constant value used in expressions and predicates, such as numbers, strings, dates, etc. |
File Formats
| Format | Usage | Description |
|---|---|---|
| Parquet | Data files | The primary data file format used by Iceberg, providing columnar storage with compression and encoding optimizations. |
| Avro | Metadata files | Used for manifests and manifest lists due to its schema evolution capabilities and compact binary format. |
| ORC | Data files | An alternative columnar format supported by some Iceberg implementations. |
Metadata
| Term | Definition |
|---|---|
| Metadata File | A JSON file containing table metadata including schema, partition spec, properties, and snapshot information. |
| Metadata Location | The URI pointing to the current metadata file for a table, stored in the catalog. |
| Properties | Key-value pairs that configure table behavior, such as compression settings, write options, and custom metadata. |
| Statistics | Metadata about data files including record counts, file sizes, and value ranges for optimization. |
Transactions
| Term | Definition |
|---|---|
| Transaction | A sequence of operations that are committed atomically, ensuring data consistency and ACID properties. |
| Commit | The process of finalizing a transaction by creating a new snapshot and updating the metadata file. |
| Rollback | The process of undoing changes in a transaction, typically by reverting to a previous snapshot. |
References
| Term | Definition |
|---|---|
| Branch | A named reference to a specific snapshot, allowing multiple concurrent views of table data. |
| Tag | An immutable reference to a specific snapshot, typically used for versioning and releases. |
Storage
| Term | Definition |
|---|---|
| Warehouse | The root directory or bucket where table data and metadata are stored. |
| Location Provider | A component that generates file paths for table data and metadata based on table location and naming conventions. |
| FileIO | An abstraction layer for reading and writing files across different storage systems (local filesystem, S3, GCS, Azure Blob, etc.). |
Query Optimization
| Technique | Description |
|---|---|
| Column Pruning | A technique that reads only the columns needed for a query, reducing I/O and improving performance. |
| Partition Pruning | A technique that skips reading data files from irrelevant partitions based on query predicates. |
| Predicate Pushdown | A technique that applies filtering predicates at the storage layer, reducing data transfer and processing. |
| Statistics-based Optimization | Using table and file statistics to optimize query execution plans and file selection. |
Schema Evolution
| Term | Definition |
|---|---|
| Schema Evolution | The process of modifying a table's schema over time while maintaining backward compatibility. |
| Column Addition | Adding new columns to a table schema, which are typically optional to maintain compatibility. |
| Column Deletion | Removing columns from a table schema, which may be logical (marking as deleted) or physical. |
| Column Renaming | Changing column names while preserving data and type information. |
| Type Evolution | Changing column types in ways that maintain data compatibility (e.g., int32 to int64). |
Time Travel
| Term | Definition |
|---|---|
| Time Travel | The ability to query a table as it existed at a specific point in time using snapshot timestamps. |
| Snapshot Isolation | A property that ensures queries see a consistent view of data as it existed at a specific snapshot. |
ACID Properties
| Property | Description |
|---|---|
| Atomicity | Ensures that all operations in a transaction either succeed completely or fail completely. |
| Consistency | Ensures that the table remains in a valid state after each transaction. |
| Isolation | Ensures that concurrent transactions do not interfere with each other. |
| Durability | Ensures that committed changes are permanently stored and survive system failures. |
Releases
Apache Iceberg Go follows the standard Apache release process. Releases are cut from main, voted on by the Apache Iceberg PMC, and published as signed source tarballs to https://downloads.apache.org/iceberg/ and as Go module versions tagged vX.Y.Z.
Latest release
Using a release
Pin a tagged version directly with Go modules:
go get github.com/apache/iceberg-go@vX.Y.Z
To track main, use @main instead of a version tag.
Release notes
Per-release notes (highlights, breaking changes, contributors) are published on the GitHub Releases page. Iceberg Go does not maintain a curated CHANGELOG.md in the repository; the GitHub Releases page is the canonical source.
Verifying and producing releases
If you are validating an RC or cutting a new release, see:
- Verify a release - what to do when a
[VOTE]thread is posted ondev@iceberg.apache.org. - How to release - PMC/committer process for cutting an RC and publishing.
Verify a release
When a release candidate is announced on dev@iceberg.apache.org, anyone (committer or not) can help by verifying the artifacts. Verification is required from at least three Apache Iceberg PMC members for the vote to pass.
The repository ships a script that performs the full verification end-to-end: dev/release/verify_rc.sh.
What an RC announcement contains
A [VOTE] iceberg-go X.Y.Z RC<N> thread on dev@iceberg.apache.org will reference:
- A signed source tarball under
https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-go-${VERSION}-rc${RC}/(.tar.gz,.tar.gz.asc,.tar.gz.sha512) - The Apache Iceberg
KEYSfile athttps://downloads.apache.org/iceberg/KEYS - A GitHub compare URL between the previous release tag and the new RC tag
Prerequisites
The script requires:
curlgpgshasumorsha512sumtar
You do not need Go installed - if Go is not on the system, the latest Go is downloaded automatically and used only for verification.
Import the Apache Iceberg KEYS
curl https://downloads.apache.org/iceberg/KEYS -o KEYS
gpg --import KEYS
Run the verification script
The script takes the version and RC number as positional arguments:
dev/release/verify_rc.sh ${VERSION} ${RC}
For example, to verify 0.6.0 RC1:
dev/release/verify_rc.sh 0.6.0 1
If the verification succeeds, the script prints:
RC looks good!
Optional environment variables
verify_rc.sh honors these environment variables (all optional):
| Variable | Default | Effect |
|---|---|---|
VERIFY_DEFAULT | 1 | Master switch propagated to VERIFY_DOWNLOAD and VERIFY_SIGN if they are unset. |
VERIFY_DOWNLOAD | ${VERIFY_DEFAULT} | Re-download artifacts when 1; reuse the local copy when 0. |
VERIFY_SIGN | ${VERIFY_DEFAULT} | Re-run signature and checksum verification when 1. |
VERIFY_FORCE_USE_GO_BINARY | 0 | When 1, ignore any system Go and use the script's auto-downloaded Go. |
GITHUB_TOKEN | unset | Optional - supplies authenticated requests when fetching the latest Go release, avoiding rate limits. |
Manual verification fallback
If you would rather verify by hand, the underlying steps are:
# Download the artifacts
curl -O https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-go-${VERSION}-rc${RC}/apache-iceberg-go-${VERSION}.tar.gz
curl -O https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-go-${VERSION}-rc${RC}/apache-iceberg-go-${VERSION}.tar.gz.asc
curl -O https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-go-${VERSION}-rc${RC}/apache-iceberg-go-${VERSION}.tar.gz.sha512
# Verify the signature
gpg --verify apache-iceberg-go-${VERSION}.tar.gz.asc apache-iceberg-go-${VERSION}.tar.gz
# Verify the checksum (or sha512sum -c on systems without shasum)
shasum -a 512 --check apache-iceberg-go-${VERSION}.tar.gz.sha512
# Inspect the contents
tar xf apache-iceberg-go-${VERSION}.tar.gz
You should reply to the [VOTE] thread with +1, +0, or -1 and a one-line description of what you verified (for example, "verified signatures, checksums, and verify_rc.sh passed on macOS arm64 with system Go 1.25.5").
How to release
This page is for Apache Iceberg PMC members and committers who are cutting a new Apache Iceberg Go release. The canonical scripts live in dev/release/ and dev/release/README.md is the source of truth - this page mirrors it and adds context.
Requirements
-
You must be an Apache Iceberg committer or PMC member.
-
You must prepare a PGP key for signing. See https://infra.apache.org/release-signing.html#generate.
-
Your PGP key must be registered in the Apache Iceberg
KEYSfile athttps://downloads.apache.org/iceberg/KEYS. To add a key:svn co https://dist.apache.org/repos/dist/release/iceberg cd iceberg $EDITOR KEYS svn ci KEYS -
You must run the release scripts from a working copy whose
originremote isgit@github.com:apache/iceberg-go.git(not your fork).release_rc.shenforces this.
Overview
- Test the revision to be released.
- Prepare an RC and start a vote.
- On a passing vote, publish.
Prepare an RC and vote
Run dev/release/release_rc.sh against the canonical clone:
git clone git@github.com:apache/iceberg-go.git
cd iceberg-go
GH_TOKEN=${YOUR_GITHUB_TOKEN} dev/release/release_rc.sh ${VERSION} ${RC}
Example for 0.6.0 RC1:
GH_TOKEN=${YOUR_GITHUB_TOKEN} dev/release/release_rc.sh 0.6.0 1
The arguments are the version and the RC number. If RC1 has a problem, increment the RC number to RC2, RC3, and so on.
release_rc.sh will:
- Tag
vX.Y.Z-rc<N>and push the tag. - Create a signed source tarball.
- Upload the artifacts to
https://dist.apache.org/repos/dist/dev/iceberg/. - Print a draft
[VOTE]email you can use ondev@iceberg.apache.org.
Send the [VOTE] email. The vote runs for at least 72 hours and requires three +1 votes from PMC members with no -1 votes to pass.
Publish
When the vote passes, run:
GH_TOKEN=${YOUR_GITHUB_TOKEN} dev/release/release.sh ${VERSION} ${RC}
release.sh moves the artifacts from https://dist.apache.org/repos/dist/dev/iceberg/ to https://dist.apache.org/repos/dist/release/iceberg/ (which feeds https://downloads.apache.org/iceberg/) and creates a GitHub Release with auto-generated notes.
Post-release tasks
After publishing, complete these steps:
- Add the release to ASF's report database at the Apache Committee Report Helper.
- Verify the GitHub Release at
https://github.com/apache/iceberg-go/releasesis correctly tagged, has generated release notes against the prior tag, and is marked as latest.release.shrunsgh release create ... --generate-notes --verify-tagso the release should already exist; double-check the notes and the "Latest" badge. - Send the
[ANNOUNCE]email todev@iceberg.apache.organdannounce@apache.org. - File a release blog post in
apache/icebergundersite/docs/blog/posts/. See the prior 0.5.0 post (2026-03-05-iceberg-go-0.5.0-release.md) for the frontmatter and structure.
Patch releases
dev/release/README.md does not document a patch-branch convention (e.g. iceberg-go-0.X.x). Confirm with the PMC on dev@iceberg.apache.org before cutting a patch release.
Other Iceberg implementations
Apache Iceberg Go is one of several official Iceberg implementations. Pick the one that matches your runtime; the table format and catalog protocols are the same across all of them.
| Project | Language | Repository | Documentation |
|---|---|---|---|
| Apache Iceberg | Java (reference) | apache/iceberg | iceberg.apache.org |
| PyIceberg | Python | apache/iceberg-python | py.iceberg.apache.org |
| iceberg-rust | Rust | apache/iceberg-rust | rust.iceberg.apache.org |
| iceberg-cpp | C++ | apache/iceberg-cpp | cpp.iceberg.apache.org (early stage) |
When to use which
- Java is the reference implementation and is what every query engine integrates against (Spark, Flink, Trino, Hive, Presto, Dremio, etc.). If you are running a JVM workload, this is the canonical choice.
- PyIceberg is for Python and the dataframe ecosystem (PyArrow, Pandas, Polars, DuckDB, Daft, Ray). Most data-science and ML workflows live here.
- iceberg-rust is the Rust implementation, used by
pyiceberg-core, DataFusion-based engines, and other Rust-native systems. - iceberg-cpp is early-stage (0.2.0 released 2026-01-26). Track the project for native C++ integration once it stabilizes.
- iceberg-go (this project) is for Go services and tooling. Tight Apache Arrow Go integration makes it a good fit for streaming Arrow record batches into and out of Iceberg tables.
Specifications and shared concepts
The Iceberg spec, terminology, partitioning semantics, evolution semantics, REST Catalog OpenAPI, and multi-engine support policy live with the main project at iceberg.apache.org. All the implementations above target the same spec.
For Apache Iceberg Go-specific guidance, continue with the API Reference, CLI, or Configuration.