Iceberg Go

Apache Iceberg Go is a Go-native implementation of Apache Iceberg, the open table format for analytic datasets. Read and write Iceberg tables from Go services and tooling without a JVM.

Go Reference

Where to start

  • Install - get the library and CLI on your machine.
  • CLI - inspect tables, run maintenance, manage refs.
  • API - construct catalogs, scan tables, write Arrow data, evolve schemas.
  • Configuration - YAML config file, catalog options, FileIO and credentials, table properties.

What works today

A capability matrix - filesystem, metadata operations, catalog support, and write operations - lives on the dedicated Feature Status page.

Beyond Go

Apache Iceberg is multi-language. PyIceberg, iceberg-rust, iceberg-cpp, and the Java reference implementation all target the same spec - see Other Iceberg implementations for cross-links.

Help and contribution

The canonical Iceberg specification, terminology, and multi-engine policy live with the main project at iceberg.apache.org. This site covers what is specific to the Go implementation.

Install

In this quickstart, we’ll glean insights from code segments and learn how to:

Requirements

Go 1.25 or later is required to build.

Installation

To install iceberg-go package, you need to install Go and set your Go workspace first. If you don't have a go.mod file, create it with go mod init gin.

  1. Download and install it:
go get -u github.com/apache/iceberg-go
  1. Import it in your code:
import "github.com/apache/iceberg-go"

Getting Started

This walkthrough takes you from go get to a fully written-and-read Iceberg table in a few minutes, using a local SQLite-backed catalog so you do not need any external services. By the end you will have:

  • An Iceberg catalog stored in a SQLite file
  • A table with an Arrow schema
  • Data written from Apache Arrow Go and read back
  • A schema evolution committed
  • A branch created on top of the table

If you would rather see all the operations as a reference, jump straight to the API.

1. Install

go mod init iceberg-go-tutorial
go get github.com/apache/iceberg-go@latest
go get github.com/apache/arrow-go/v18@latest
go get github.com/uptrace/bun/driver/sqliteshim@latest

iceberg-go itself only registers the local file system. For S3, GCS, or Azure Blob you would also blank-import github.com/apache/iceberg-go/io/gocloud. We are staying on local disk for this tutorial.

2. Open a local catalog

The SQL catalog stores its metadata in a database, and the actual data files live under a "warehouse" directory. We will use SQLite for both the catalog DB and a local folder for the warehouse.

package main

import (
    "context"
    "database/sql"
    "log"

    "github.com/apache/iceberg-go"
    sqlcat "github.com/apache/iceberg-go/catalog/sql"
    "github.com/uptrace/bun/driver/sqliteshim"
)

func openCatalog(ctx context.Context) (*sqlcat.Catalog, error) {
    db, err := sql.Open(sqliteshim.ShimName, "file:./tutorial-catalog.db?cache=shared")
    if err != nil {
        return nil, err
    }
    return sqlcat.NewCatalog("default", db, sqlcat.SQLite, iceberg.Properties{
        "warehouse":            "file:///tmp/iceberg-tutorial",
        "init_catalog_tables":  "true",
    })
}

init_catalog_tables: "true" lets the catalog create its bookkeeping tables (iceberg_tables, iceberg_namespace_properties) on first use. Make sure /tmp/iceberg-tutorial is writable.

3. Create a namespace and a table

Iceberg organizes tables under namespaces (the analog of databases). We will create one and put a table inside it.

import (
    "github.com/apache/iceberg-go/catalog"
    "github.com/apache/iceberg-go/table"
)

func createTrips(ctx context.Context, cat *sqlcat.Catalog) (*table.Table, error) {
    ns := table.Identifier{"taxi"}
    if err := cat.CreateNamespace(ctx, ns, nil); err != nil {
        return nil, err
    }

    schema := iceberg.NewSchema(1,
        iceberg.NestedField{ID: 1, Name: "trip_id", Type: iceberg.PrimitiveTypes.Int64, Required: true},
        iceberg.NestedField{ID: 2, Name: "fare", Type: iceberg.PrimitiveTypes.Float64, Required: false},
        iceberg.NestedField{ID: 3, Name: "borough", Type: iceberg.PrimitiveTypes.String, Required: false},
    )

    ident := catalog.ToIdentifier("taxi", "trips")
    return cat.CreateTable(ctx, ident, schema)
}

If the namespace already exists, CreateNamespace returns an error - the example skips error handling for brevity; production code should check for iceberg.ErrAlreadyExists and recover.

4. Write some Arrow data

The write path takes either a streaming array.RecordReader or a fully materialized arrow.Table. We will build a small Arrow table inline and append it.

import (
    "github.com/apache/arrow-go/v18/arrow"
    "github.com/apache/arrow-go/v18/arrow/array"
    "github.com/apache/arrow-go/v18/arrow/memory"
)

func writeSomeTrips(ctx context.Context, tbl *table.Table) (*table.Table, error) {
    mem := memory.NewGoAllocator()

    arrowSchema := arrow.NewSchema([]arrow.Field{
        {Name: "trip_id", Type: arrow.PrimitiveTypes.Int64, Nullable: false},
        {Name: "fare", Type: arrow.PrimitiveTypes.Float64, Nullable: true},
        {Name: "borough", Type: arrow.BinaryTypes.String, Nullable: true},
    }, nil)

    b := array.NewRecordBuilder(mem, arrowSchema)
    defer b.Release()

    b.Field(0).(*array.Int64Builder).AppendValues([]int64{1, 2, 3}, nil)
    b.Field(1).(*array.Float64Builder).AppendValues([]float64{12.50, 8.75, 22.10}, nil)
    b.Field(2).(*array.StringBuilder).AppendValues([]string{"Manhattan", "Brooklyn", "Queens"}, nil)

    rec := b.NewRecord()
    defer rec.Release()

    arrowTbl := array.NewTableFromRecords(arrowSchema, []arrow.Record{rec})
    defer arrowTbl.Release()

    return tbl.AppendTable(ctx, arrowTbl, 1024 /* batchSize */, nil)
}

AppendTable returns a refreshed *table.Table reflecting the new snapshot.

5. Read the data back

func readAll(ctx context.Context, tbl *table.Table) error {
    arrowTbl, err := tbl.Scan().ToArrowTable(ctx)
    if err != nil {
        return err
    }
    defer arrowTbl.Release()

    log.Printf("read %d rows in %d columns", arrowTbl.NumRows(), arrowTbl.NumCols())
    return nil
}

For larger tables, use streaming so only one batch is in memory at a time:

arrowSchema, batches, err := tbl.Scan().ToArrowRecords(ctx)
if err != nil {
    return err
}
log.Printf("scan schema: %s", arrowSchema)

for batch, err := range batches {
    if err != nil {
        return err
    }
    log.Printf("batch with %d rows", batch.NumRows())
    batch.Release()
}

6. Filter and project

Use the predicate DSL to push filters down into the scan, and WithSelectedFields to project columns:

filter := iceberg.GreaterThan(iceberg.Reference("fare"), float64(10.0))

arrowTbl, err := tbl.Scan(
    table.WithSelectedFields("trip_id", "fare"),
    table.WithRowFilter(filter),
).ToArrowTable(ctx)

7. Evolve the schema

Most schema changes go through a transaction. Add a tip column:

import "github.com/apache/iceberg-go/table"

func addTipColumn(ctx context.Context, tbl *table.Table) (*table.Table, error) {
    txn := tbl.NewTransaction()

    err := table.NewUpdateSchema(txn, true /* caseSensitive */, false /* allowIncompatible */).
        AddColumn([]string{"tip"}, iceberg.PrimitiveTypes.Float64, "Tip in dollars", false, nil).
        Commit()
    if err != nil {
        return nil, err
    }

    return txn.Commit(ctx)
}

The returned table has the new schema and a fresh metadata file.

8. Branch the table

A branch is a named ref that points at a snapshot. Creating one requires a metadata commit that adds the ref - NewTransactionOnBranch only writes to a branch that already exists. Two steps:

import "github.com/apache/iceberg-go/table"

// Step 1: create the branch via Catalog.CommitTable.
func createExperimentBranch(ctx context.Context, cat *sqlcat.Catalog, tbl *table.Table) (*table.Table, error) {
    snap := tbl.CurrentSnapshot()
    update := table.NewSetSnapshotRefUpdate(
        "experiment", snap.SnapshotID, table.BranchRef,
        0 /* maxRefAgeMs */, 0 /* maxSnapshotAgeMs */, 0 /* minSnapshotsToKeep */,
    )
    reqs := []table.Requirement{
        table.AssertTableUUID(tbl.Metadata().TableUUID()),
        table.AssertRefSnapshotID("experiment", nil), // branch must not yet exist
    }
    if _, _, err := cat.CommitTable(ctx, tbl.Identifier(), reqs, []table.Update{update}); err != nil {
        return nil, err
    }
    // Reload so subsequent transactions see the new ref.
    return cat.LoadTable(ctx, tbl.Identifier())
}

// Step 2: open a transaction on the branch and write.
func writeOnBranch(ctx context.Context, tbl *table.Table, arrowTbl arrow.Table) (*table.Table, error) {
    txn := tbl.NewTransactionOnBranch("experiment")
    if err := txn.AppendTable(ctx, arrowTbl, 1024, nil); err != nil {
        return nil, err
    }
    return txn.Commit(ctx)
}

Reads can target the branch with tbl.Scan().UseRef("experiment").

Where to go next

  • API - the full Go surface for catalogs, tables, scans, writes, transactions, schema and partition evolution, snapshot management, maintenance, and views.
  • Configuration - cloud credentials, table properties, concurrency, custom catalog and IO registration.
  • CLI - inspect, expire, compact, branch, and tag from the command line.
  • Row Filter Syntax and Expression DSL - the predicate DSL in detail.

The full Iceberg specification, terminology, and engine integration policy live with the main project at iceberg.apache.org.

Configuration

This page documents how to configure Apache Iceberg Go: the CLI's YAML config file, per-catalog Option surfaces, file-system credentials, table write properties, concurrency, and how to plug in custom catalogs and IO backends.

Only properties and options that the iceberg-go code actually reads are listed. Properties defined in the Apache Iceberg spec but not yet wired into iceberg-go are intentionally omitted - check pkg.go.dev/github.com/apache/iceberg-go for the latest read sites.

CLI configuration file

The iceberg CLI loads catalog defaults from ~/.iceberg-go.yaml (override the directory with GOICEBERG_HOME). The schema, defined in config/config.go, is:

default-catalog: default
max-workers: 5
catalog:
  default:
    type: rest
    uri: https://example.com/iceberg
    warehouse: s3://my-bucket/warehouse
    credential: <client-id>:<client-secret>
    output: text
    rest:
      sigv4-enabled: false
      signing-name: ""
      signing-region: ""
KeyPurpose
default-catalogName used when --catalog-name is not passed on the CLI.
max-workersWorker pool size for concurrent operations. Default 5.
catalog.<name>.typeOne of rest, hive, glue, sql, hadoop.
catalog.<name>.uriCatalog endpoint or DSN.
catalog.<name>.warehouseWarehouse identifier (REST/Glue) or location (Hive/SQL).
catalog.<name>.credentialCredential string passed through to the catalog's auth handler.
catalog.<name>.outputCLI output format (e.g. text, json).
catalog.<name>.rest.sigv4-enabledEnable AWS SigV4 signing for REST.
catalog.<name>.rest.signing-nameSigV4 service name.
catalog.<name>.rest.signing-regionSigV4 region.

Catalog options

Each catalog package exposes its own functional Option set. The lists below reflect the public option surface; pkg.go.dev is authoritative for the current set.

REST (catalog/rest)

The most option-rich surface. Source: catalog/rest/options.go.

GroupOptions
AuthenticationWithCredential, WithOAuthToken, WithAuthManager, WithAuthURI, WithScope, WithAudience, WithResource
AWS SigV4WithSigV4, WithSigV4RegionSvc, WithAwsConfig
HTTPWithHeaders, WithTLSConfig, WithOAuthTLSConfig, WithCustomTransport
Catalog routingWithPrefix, WithWarehouseLocation, WithMetadataLocation
Pass-throughWithAdditionalProps

Hive (catalog/hive)

Source: catalog/hive/options.go.

  • WithURI(uri string) - Thrift URI for the Hive Metastore (e.g. thrift://127.0.0.1:9083).
  • WithWarehouse(warehouse string)
  • WithProperties(props iceberg.Properties)

Glue (catalog/glue)

Source: catalog/glue/options.go.

  • WithAwsConfig(cfg aws.Config) - AWS SDK v2 config; respects the AWS default credential chain.
  • WithAwsProperties(props AwsProperties) - explicit overrides for region/endpoint/access keys.

SQL (catalog/sql)

The SQL catalog has no functional-option surface. Construct it with NewCatalog:

db, _ := sql.Open(sqliteshim.ShimName, "file:catalog.db")
cat, err := sqlcat.NewCatalog("default", db, sqlcat.SQLite, iceberg.Properties{
    "warehouse": "file:///tmp/warehouse",
})

Supported dialects: sqlcat.Postgres, sqlcat.MySQL, sqlcat.SQLite, sqlcat.MSSQL, sqlcat.Oracle (catalog/sql/sql.go:50).

Shared options on the base catalog package

Operations that create or update tables/views accept these (catalog/catalog.go):

  • WithLocation, WithPartitionSpec, WithSortOrder, WithProperties, WithStagedUpdates
  • View-specific: WithViewLocation, WithViewProperties

File-system credentials

iceberg-go registers the local file system (file://) automatically. Cloud schemes are not registered until you add a blank import:

import _ "github.com/apache/iceberg-go/io/gocloud"

The init() function in io/gocloud/register.go registers s3, s3a, s3n, oss, gs, abfs, abfss, wasb, and wasbs. Without the blank import, these schemes return ErrIOSchemeNotFound with a hint to add the import.

All credential and tuning property keys are constants in io/config.go. They can be supplied through table properties, catalog properties, or per-call iceberg.Properties arguments depending on context.

S3

Authentication is resolved in this order (io/gocloud/s3.go):

  1. Static credentials in properties: s3.access-key-id + s3.secret-access-key (+ optional s3.session-token).
  2. The standard AWS SDK v2 default credential chain - environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN), ~/.aws/credentials, container/IAM role.

Tuning properties:

Key (constant)Purpose
s3.region (io.S3Region) / client.region (io.S3ClientRegion)AWS region.
s3.endpoint (io.S3EndpointURL)Override S3 endpoint URL (custom or compatible storage). Falls back to AWS_S3_ENDPOINT env.
s3.access-key-id (io.S3AccessKeyID)Static access key.
s3.secret-access-key (io.S3SecretAccessKey)Static secret key.
s3.session-token (io.S3SessionToken)Static session token.
s3.proxy-uri (io.S3ProxyURI)HTTP proxy URL.
s3.connect-timeout (io.S3ConnectTimeout)Either a number of seconds ("60", "60.0") or a Go duration ("5s").
s3.force-virtual-addressing (io.S3ForceVirtualAddressing)Force virtual-host-style addressing.
s3.signer.uri (io.S3SignerURI)Reserved for remote-signing endpoint (not yet implemented).

Google Cloud Storage

Authentication resolution (io/gocloud/gcs.go):

  1. Explicit JSON key bytes via gcs.jsonkey or path via gcs.keypath.
  2. Optional gcs.credtype selecting one of service_account, authorized_user, impersonated_service_account, external_account.
  3. The GCP default credentials chain (gcp.DefaultCredentials) - falls back to anonymous if no creds are found.

Tuning properties:

Key (constant)Purpose
gcs.endpoint (io.GCSEndpoint)Custom GCS endpoint URL.
gcs.keypath (io.GCSKeyPath)Path to a JSON service-account key file.
gcs.jsonkey (io.GCSJSONKey)JSON key as a string.
gcs.credtype (io.GCSCredType)Credential type override.
gcs.usejsonapi (io.GCSUseJSONAPI)Set to any value to enable the GCS JSON API for reads.

Azure Data Lake Storage / Blob

Authentication is selected based on the property keys present (io/gocloud/azure.go):

  1. Shared key: both adls.auth.shared-key.account.name and adls.auth.shared-key.account.key set.
  2. Per-host SAS token: adls.sas-token.<hostname> (prefix-matched against the storage account host).
  3. Per-host connection string: adls.connection-string.<hostname>.
  4. Managed identity: adls.auth.managed-identity.enabled set to a truthy value.

Tuning properties:

Key (constant)Purpose
adls.auth.shared-key.account.name (io.ADLSSharedKeyAccountName)Account name.
adls.auth.shared-key.account.key (io.ADLSSharedKeyAccountKey)Account key.
adls.sas-token.<host> (prefix io.ADLSSasTokenPrefix)Per-host SAS token.
adls.connection-string.<host> (prefix io.ADLSConnectionStringPrefix)Per-host connection string.
adls.client-id (io.ADLSClientID)Client/application ID for AAD auth.
adls.endpoint (io.ADLSEndpoint)Storage domain (e.g. blob.core.windows.net).
adls.protocol (io.ADLSProtocol)http or https.
adls.auth.managed-identity.enabled (io.ADLSManagedIdentityEnabled)Enable Azure Managed Identity auth.

Environment variables

iceberg-go reads only a small set of environment variables directly. AWS / GCP / Azure credentials flow through the respective SDKs, not through iceberg-go-defined env vars.

VariablePurposeRead at
GOICEBERG_HOMEDirectory containing .iceberg-go.yaml. Defaults to the user's home directory.config/config.go:87
ICEBERG_SQL_DEBUGSQL catalog query logging - 1 (failed queries), 2 (all queries).catalog/sql/sql.go:206
AWS_S3_ENDPOINTFallback S3 endpoint when s3.endpoint is unset.io/gocloud/s3.go:193

There is no PYICEBERG_*-style env var convention. Use the YAML config file or pass iceberg.Properties to overrides programmatically.

Concurrency

SettingSourceEffect
max-workers in ~/.iceberg-go.yaml (config.EnvConfig.MaxWorkers)YAML configWorker pool size used by parallel column writes, snapshot producers, scan plan, equality-delete writers. Default 5.
WitMaxConcurrency(n int) ScanOptionCode (table.WitMaxConcurrency)Per-scan override. Note: function name is Wit... (not With...) - this is a pre-existing typo in the public API.
WithMaxWriteWorkers(n int)Code (per-write API on WriteRecords)Per-write override of the worker count.
WithClusteredWrite()Code (per-write API on WriteRecords)Forces single-threaded writes. Mutually exclusive with WithMaxWriteWorkers.

Pluggability

Two registries are user-extensible. The third (LocationProvider) is currently informational.

IO scheme registry

Register a custom URL scheme with io.Register:

import (
    "context"
    "net/url"

    "github.com/apache/iceberg-go/io"
)

func init() {
    io.Register("myfs", func(ctx context.Context, parsed *url.URL, props map[string]string) (io.IO, error) {
        return newMyFS(parsed, props)
    })
}

io.Register panics on nil factory or duplicate scheme. Built-in schemes: file, "" (the empty scheme). Cloud schemes (s3, gs, abfs, etc.) are registered by io/gocloud only when its package is blank-imported.

io.GetRegisteredSchemes() returns the current scheme list; io.Unregister(scheme) removes one.

Catalog type registry

Register a custom catalog type with catalog.Register:

import (
    "context"

    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog"
)

func init() {
    catalog.Register("mycatalog", catalog.RegistrarFunc(
        func(ctx context.Context, name string, props iceberg.Properties) (catalog.Catalog, error) {
            return newMyCatalog(name, props)
        },
    ))
}

After registration, catalog.Load(ctx, "default", iceberg.Properties{"type": "mycatalog", ...}) will route to the factory. Built-in types: rest, hive, glue, sql, hadoop.

catalog.GetRegisteredCatalogs() returns the current list; catalog.Unregister(catalogType) removes one.

LocationProvider

table/locations.go defines the LocationProvider interface and ships two implementations: simpleLocationProvider (default) and objectStoreLocationProvider (selected by write.object-storage.enabled = true). The provider is chosen by table properties and is not user-pluggable today.

Table write properties

Property key constants are in table/properties.go and Parquet keys in table/internal/parquet_files.go. The keys below have verified read sites in non-test code.

Format and file sizing

KeyDefaultDescription
write.format.defaultparquetFile format used when writing data files. Read in table/writer.go and table/rolling_data_writer.go.
write.target-file-size-bytes(set by writer)Target size for newly written data files. Read in table/arrow_utils.go and table/equality_delete_writer.go.

Metrics and metadata lifecycle

KeyDescription
write.metadata.metrics.defaultDefault per-column metrics mode.
write.metadata.metrics.column.<name>Per-column override prefix.
write.metadata.delete-after-commit.enabledWhen true, expire old metadata files after a successful commit.
write.metadata.previous-versions-maxCap on retained metadata files. Default 100.
write.metadata.compression-codecCompression for metadata JSON.

Manifest and commit

KeyDescription
commit.manifest-merge.enabledMerge small manifests during commit.
commit.manifest.target-size-bytesTarget size for merged manifests.
commit.manifest.min-count-to-mergeMinimum manifest count that triggers a merge.
commit.retry.num-retriesRetries for ErrCommitFailed.
commit.retry.min-wait-ms / commit.retry.max-wait-ms / commit.retry.total-timeout-msBackoff bounds.

Snapshot retention

KeyDescription
min-snapshots-to-keepMinimum snapshots to retain when expiring.
max-snapshot-age-msMaximum age of retained snapshots.
max-ref-age-msMaximum age of branch/tag refs that are not the main branch.
gc.enabledGate for orphan-file cleanup.

Delete mode

KeyDescription
write.delete.modeDelete strategy used by row-level delete writers.

Object-store data layout

KeyDescription
write.data.pathOverride data file directory.
write.metadata.pathOverride metadata file directory.
write.object-storage.enabledSwitch the location provider to the hashed object-storage layout.
write.object-storage.partitioned-pathsWhether partition values are included in object-storage paths.

Parquet writer

All defined in table/internal/parquet_files.go and read by the Parquet writer:

KeyDefault
write.parquet.row-group-size-bytes128 MB
write.parquet.row-group-limit1,048,576 rows
write.parquet.page-size-bytes1 MB
write.parquet.page-row-limit20,000 rows
write.parquet.dict-size-bytes2 MB
write.parquet.page-version2
write.parquet.compression-codeczstd
write.parquet.compression-level-1 (codec default)
write.parquet.bloom-filter-max-bytes1 MB
write.parquet.bloom-filter-enabled.column.<name>(per-column toggle, prefix-matched)

Parquet reader

KeyDescription
read.parquet.batch-sizeArrow record-batch size used by the Parquet reader.

CLI

Run go build ./cmd/iceberg from the root of this repository to build the CLI executable, alternately you can run go install github.com/apache/iceberg-go/cmd/iceberg to install it to the bin directory of your GOPATH.

The iceberg CLI usage is very similar to pyiceberg CLI.
You can pass the catalog URI with the --uri argument.

Connecting to a catalog

Start a local REST catalog (the default --catalog type):

docker pull apache/iceberg-rest-fixture:latest
docker run -p 8181:8181 apache/iceberg-rest-fixture:latest

and run the iceberg CLI pointing to the REST API server:

 ./iceberg --uri http://0.0.0.0:8181 list
┌─────┐
| IDs |
| --- |
└─────┘

Catalog connection flags are global and apply to every subcommand:

FlagDescription
--catalogCatalog type: rest (default), glue, hive, hadoop
--uriCatalog URI (REST/Hive)
--warehouseWarehouse location
--credentialCredentials for the catalog
--tokenOAuth token (skips OAuth flow)
--scopeOAuth scope (default catalog)
--catalog-nameCatalog name to load from config file (default default)
--configPath to a config file

To avoid passing flags every time, define a config file at ~/.iceberg-go.yaml:

default-catalog: default
catalog:
  default:
    type: rest
    uri: http://localhost:8181
    warehouse: s3://my-warehouse

Flags on the command line override values from the config file.

Output format

All commands accept --output text (default, human-readable) or --output json (machine-readable, suitable for piping into jq or scripts).

Catalog and namespace commands

Create namespace

./iceberg --uri http://0.0.0.0:8181 create namespace taxitrips

List namespaces

 ./iceberg --uri http://0.0.0.0:8181 list
┌───────────┐
| IDs       |
| --------- |
| taxitrips |
└───────────┘

Create table

Note: only the identity transform is supported for --partition-spec at this moment.

# Create a simple table with REST catalog and Minio
./iceberg create table default.table-1 \
        --properties write.format.default=parquet \
        --partition-spec foo \
        --sort-order foo:desc:nulls-last \
        --schema '[{"id":1,"name":"foo","type":"string","required":false},{"id":2,"name":"bar","type":"int","required":true}]' \
        --catalog rest \
        --uri http://localhost:8181
Table default.table-1 created successfully

# Describe the newly created table
./iceberg describe --catalog rest --uri http://localhost:8181 default.table-1
Table format version | 2
Metadata location    | s3://warehouse/default/table-1/metadata/00000-f0ccaadd-d988-482e-99da-3a37870288fe.metadata.json
Table UUID           | 33fa3fac-e638-4335-a085-343c6d9e7de5
Last updated         | 1753133512562
Sort Order           | 1: [
                     | 1 desc nulls-last
                     | ]
Partition Spec       | [
                     |  1000: foo: identity(1)
                     | ]

Current Schema, id=0
├──1: foo: optional string
└──2: bar: required int

Current Snapshot |

Snapshots

Properties
key                             | value
-----------------------------------------
write.format.default            | parquet
write.parquet.compression-codec | zstd

Inspecting tables

info — single-screen summary

iceberg info my_db.events

Reports format version, location, current snapshot, schema, partition spec, sort order, snapshot count, ref count, and property count for a table.

snapshots — snapshot history

iceberg snapshots my_db.events

Lists all snapshots with timestamp, parent snapshot, operation (append, overwrite, delete, replace), and added/deleted data file counts.

refs — branches and tags

iceberg refs my_db.events
iceberg refs --type branch my_db.events
iceberg refs --type tag    my_db.events

Lists snapshot refs along with their retention settings (max-ref-age, max-snapshot-age, min-snapshots-to-keep).

partition-stats — partition statistics files

iceberg partition-stats my_db.events                          # current snapshot
iceberg partition-stats --snapshot-id 7234981023498 my_db.events
iceberg partition-stats --all my_db.events                    # all snapshots

schema --show-defaults

iceberg schema --show-defaults my_db.events

Prints the schema and surfaces each field's initial-default and write-default — useful when debugging schema-evolution behavior.

Snapshot maintenance

expire-snapshots

Drop old snapshots so their unreferenced data files become eligible for cleanup.

FlagDescription
--older-than DURATIONExpire snapshots older than the given duration (7d, 168h)
--retain-last NAlways keep at least N snapshots, regardless of age
--dry-runList what would be expired without committing
--yesSkip the confirmation prompt
# Preview
iceberg expire-snapshots --older-than 7d --dry-run my_db.events

# Commit, retaining at least 5 snapshots
iceberg expire-snapshots --older-than 7d --retain-last 5 my_db.events

clean-orphan-files

Remove data files in the table location that are not referenced by any snapshot's manifests (e.g. left behind by failed writes).

FlagDescription
--older-than DURATIONOnly consider files older than this (default 72h, gives in-flight writes time to finish)
--location PATHScan a different directory (e.g. an old warehouse path after migration)
--dry-runList orphan files without deleting
--yesSkip the confirmation prompt
iceberg clean-orphan-files --dry-run my_db.events
iceberg clean-orphan-files --older-than 5d my_db.events
iceberg clean-orphan-files --location s3://old-warehouse/my_db/events my_db.events

Rollback

Reset the current snapshot pointer to a previous snapshot. The target must be an ancestor of the current snapshot.

FlagDescription
--snapshot-id IDSnapshot to roll back to (required)
--yesSkip the confirmation prompt
iceberg rollback --snapshot-id 6891234567890 my_db.events

Format upgrade

Upgrade the table format version (metadata-only operation; no data files are rewritten). Refuses downgrades and same-version "upgrades".

FlagDescription
--dry-runShow what would change without committing
--yesSkip the confirmation prompt
iceberg upgrade --dry-run my_db.events 2
iceberg upgrade my_db.events 2

Branches and tags

branch create

iceberg branch create my_db.events ml-experiment-v3
FlagDescription
--snapshot-id IDSnapshot the branch points at (default: current snapshot)
--max-ref-age DURATIONBranch itself expires after this age
--max-snapshot-age DURATIONSnapshots on the branch older than this can be expired
--min-snapshots-to-keep NAlways retain at least N snapshots on the branch
--yesSkip the confirmation prompt
iceberg branch create \
  --snapshot-id 7234981023498 \
  --max-ref-age 30d \
  --max-snapshot-age 7d \
  --min-snapshots-to-keep 10 \
  my_db.events audit-2026-q2

tag create

iceberg tag create my_db.events pre-migration-v4
FlagDescription
--snapshot-id IDSnapshot the tag points at (default: current snapshot)
--max-ref-age DURATIONTag is auto-cleaned after this age
--yesSkip the confirmation prompt
iceberg tag create \
  --snapshot-id 7234981023498 \
  --max-ref-age 90d \
  my_db.events monthly-backup-may

Automation

Two flags make these commands safe to run from cron jobs or CI:

  • --yes skips the interactive prompt. Without it in a non-interactive environment, the CLI exits with stdin is not a terminal: use --yes to confirm in non-interactive mode rather than hanging.
  • --output json emits structured output that can be consumed by jq and downstream tooling.

Daily maintenance over every table in a namespace:

#!/bin/bash
TABLES=$(iceberg list my_db --output json | jq -r '.identifiers[].name')
for table in $TABLES; do
  iceberg expire-snapshots --older-than 7d --retain-last 3 --yes \
    --output json "my_db.$table"
  iceberg clean-orphan-files --older-than 3d --yes \
    --output json "my_db.$table"
done

Tag tables before a deploy:

iceberg tag create --yes my_db.events "pre-deploy-$VERSION"
iceberg tag create --yes my_db.users  "pre-deploy-$VERSION"

Audit-only report (no commits):

iceberg expire-snapshots --older-than 7d --dry-run --output json my_db.events \
  | jq '{table, would_expire: .expired_snapshot_count}'

Safety features

Every write command has multiple layers of protection:

  • --dry-run — shows the would-be effect without committing. Look for [DRY RUN] in text output or "dry_run": true in JSON.
  • --yes — required to skip the prompt; without it, non-interactive shells get an explicit error rather than hanging.
  • TTY detection — interactive prompts are only shown when stdout is a terminal.
  • Ancestor validationrollback rejects target snapshots that are not in the current branch's history.
  • Version checkupgrade refuses same-version or downgrade requests.

API

The Go API surface for Apache Iceberg Go. New to the project? Walk through the Getting Started tutorial first - the recipes here assume you already have a catalog and a table.

For configuration knobs (catalog options, FileIO credentials, table properties), see Configuration. For predicate construction details, see Row Filter Syntax and Expression DSL.

Catalog

catalog.Catalog is the entry point for everything: namespaces, tables, views.

Constructing a catalog

REST

import (
    "context"

    "github.com/apache/iceberg-go/catalog/rest"
)

cat, err := rest.NewCatalog(context.Background(), "rest", "http://localhost:8181",
    rest.WithOAuthToken("your-token"))

SQL (SQLite, Postgres, MySQL, Oracle, MSSQL)

import (
    "database/sql"

    "github.com/apache/iceberg-go"
    sqlcat "github.com/apache/iceberg-go/catalog/sql"
    "github.com/uptrace/bun/driver/sqliteshim"
)

db, err := sql.Open(sqliteshim.ShimName, "file:catalog.db")
// handle err
cat, err := sqlcat.NewCatalog("default", db, sqlcat.SQLite, iceberg.Properties{
    "warehouse": "file:///tmp/warehouse",
})

Glue

import (
    "github.com/apache/iceberg-go/catalog/glue"
    "github.com/aws/aws-sdk-go-v2/config"
)

awsCfg, err := config.LoadDefaultConfig(context.TODO())
// handle err
cat := glue.NewCatalog(glue.WithAwsConfig(awsCfg))

Hive

import (
    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog/hive"
)

cat, err := hive.NewCatalog(iceberg.Properties{},
    hive.WithURI("thrift://localhost:9083"),
    hive.WithWarehouse("s3://my-bucket/warehouse"))

Hadoop

import (
    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog/hadoop"
)

cat, err := hadoop.NewCatalog("default", "file:///tmp/warehouse", iceberg.Properties{})

Via the registry

catalog.Load looks up the right backend via the type property (or uri scheme as a fallback) plus your ~/.iceberg-go.yaml. Useful when you want runtime selection:

import (
    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog"
)

cat, err := catalog.Load(ctx, "default", iceberg.Properties{
    "type":      "rest",
    "uri":       "http://localhost:8181",
    "warehouse": "s3://my-bucket/warehouse",
})

Namespaces

ns := table.Identifier{"sales"}

err := cat.CreateNamespace(ctx, ns, iceberg.Properties{"owner": "data-team"})

namespaces, err := cat.ListNamespaces(ctx, nil) // []table.Identifier
exists, err := cat.CheckNamespaceExists(ctx, ns)
props, err := cat.LoadNamespaceProperties(ctx, ns)
summary, err := cat.UpdateNamespaceProperties(ctx, ns,
    []string{"deprecated"}, // removals
    iceberg.Properties{"owner": "platform-team"}, // updates
)
err = cat.DropNamespace(ctx, ns)

table.Identifier is []string; use catalog.ToIdentifier("sales", "orders") (or catalog.ToIdentifier("sales.orders")) to build one from string parts.

Tables

Defining a schema

import "github.com/apache/iceberg-go"

schema := iceberg.NewSchema(1,
    iceberg.NestedField{ID: 1, Name: "id", Type: iceberg.PrimitiveTypes.Int64, Required: true},
    iceberg.NestedField{ID: 2, Name: "name", Type: iceberg.PrimitiveTypes.String, Required: false},
    iceberg.NestedField{ID: 3, Name: "active", Type: iceberg.PrimitiveTypes.Bool, Required: false},
)

For nested types use &iceberg.StructType{...}, &iceberg.ListType{...}, or &iceberg.MapType{...}. Use NewSchemaWithIdentifiers(id, identifierIDs, fields...) to mark identifier columns.

Create

import "github.com/apache/iceberg-go/catalog"

ident := catalog.ToIdentifier("sales", "orders")

tbl, err := cat.CreateTable(ctx, ident, schema,
    catalog.WithLocation("s3://my-bucket/sales/orders"),
    catalog.WithProperties(iceberg.Properties{"owner": "data-team"}),
)

Optional catalog.WithPartitionSpec, catalog.WithSortOrder, and catalog.WithStagedUpdates are also available.

Load, exists, list, drop, rename

tbl, err := cat.LoadTable(ctx, ident)

exists, err := cat.CheckTableExists(ctx, ident)

for ident, err := range cat.ListTables(ctx, table.Identifier{"sales"}) {
    if err != nil { /* ... */ }
    fmt.Println(ident)
}

err = cat.DropTable(ctx, ident)

renamed, err := cat.RenameTable(ctx,
    catalog.ToIdentifier("sales", "orders"),
    catalog.ToIdentifier("sales", "orders_v2"))

ListTables returns an iter.Seq2[table.Identifier, error] that streams results.

Inspecting metadata

tbl.Identifier()         // table.Identifier
tbl.Location()           // string
tbl.MetadataLocation()   // string (path of current metadata.json)
tbl.Metadata()           // table.Metadata
tbl.Schema()             // *iceberg.Schema (current)
tbl.Schemas()            // map[int]*iceberg.Schema
tbl.Spec()               // iceberg.PartitionSpec
tbl.SortOrder()          // table.SortOrder
tbl.Properties()         // iceberg.Properties

if snap := tbl.CurrentSnapshot(); snap != nil {
    fmt.Println(snap.SnapshotID, snap.TimestampMs, snap.Summary)
}

// All snapshots
for _, snap := range tbl.Metadata().Snapshots() {
    fmt.Println(snap.SnapshotID)
}

// Stream all manifest files across all snapshots
for mf, err := range tbl.AllManifests(ctx) {
    if err != nil { /* ... */ }
    fmt.Println(mf.FilePath())
}

Reading data

(t Table) Scan(opts ...ScanOption) *Scan returns a scan that you can resolve into Arrow data.

Streaming record batches

import "github.com/apache/iceberg-go/table"

scan := tbl.Scan()
arrowSchema, batches, err := scan.ToArrowRecords(ctx)
if err != nil { /* ... */ }

fmt.Println(arrowSchema)
for batch, err := range batches {
    if err != nil { /* ... */ }
    fmt.Printf("batch with %d rows\n", batch.NumRows())
    batch.Release()
}

ToArrowRecords returns iter.Seq2[arrow.RecordBatch, error] so only one batch is in memory at a time. Always call batch.Release() to free Arrow buffers.

Materializing as an Arrow Table

arrowTbl, err := tbl.Scan().ToArrowTable(ctx)
if err != nil { /* ... */ }
defer arrowTbl.Release()
fmt.Printf("%d rows in %d cols\n", arrowTbl.NumRows(), arrowTbl.NumCols())

Projection and filters

import "github.com/apache/iceberg-go"

scan := tbl.Scan(
    table.WithSelectedFields("id", "name"),
    table.WithRowFilter(
        iceberg.NewAnd(
            iceberg.GreaterThanEqual(iceberg.Reference("id"), int64(100)),
            iceberg.IsIn(iceberg.Reference("region"), "us-east", "us-west"),
        ),
    ),
    table.WithLimit(1000),
    table.WithCaseSensitive(true),
)

For the predicate vocabulary, see Row Filter Syntax.

Time travel

// By snapshot ID
scan := tbl.Scan(table.WithSnapshotID(snap.SnapshotID))

// As of a timestamp (milliseconds since epoch)
scan = tbl.Scan(table.WithSnapshotAsOf(time.Now().Add(-24*time.Hour).UnixMilli()))

Reading from a branch or tag

scan, err := tbl.Scan().UseRef("audit-branch")
if err != nil { /* ... */ }

arrowTbl, err := scan.ToArrowTable(ctx)

Iterating tasks for custom processing

If you need finer control (custom file readers, distributed scan planning):

scan := tbl.Scan(table.WithRowFilter(myFilter))
tasks, err := scan.PlanFiles(ctx)
if err != nil { /* ... */ }

arrowSchema, batches, err := scan.ReadTasks(ctx, tasks)

Writing data

The shortcut methods on Table open a transaction, perform the write, and commit. Use NewTransaction directly when you need to combine multiple operations.

Append

import (
    "github.com/apache/arrow-go/v18/arrow/array"
)

// From a streaming RecordReader
newTbl, err := tbl.Append(ctx, recordReader, nil /* snapshot props */)

// From an in-memory Arrow Table; batchSize controls the rolling writer
newTbl, err = tbl.AppendTable(ctx, arrowTbl, 1024, nil)

Overwrite

import "github.com/apache/iceberg-go/table"

// Replace all data
newTbl, err := tbl.Overwrite(ctx, recordReader, nil)

// Replace only rows matching a filter
newTbl, err = tbl.Overwrite(ctx, recordReader, nil,
    table.WithOverwriteFilter(
        iceberg.EqualTo(iceberg.Reference("date"), "2026-01-01"),
    ),
)

OverwriteTable is the arrow.Table variant.

Delete

newTbl, err := tbl.Delete(ctx,
    iceberg.LessThan(iceberg.Reference("id"), int64(100)),
    nil, /* snapshot props */
)

Add existing files

When you already have data files (e.g. produced by another writer), register them without rewriting:

txn := tbl.NewTransaction()
err := txn.AddFiles(ctx, []string{
    "s3://my-bucket/sales/orders/data/file-1.parquet",
    "s3://my-bucket/sales/orders/data/file-2.parquet",
}, nil /* snapshot props */, false /* ignoreDuplicates */)
if err != nil { /* ... */ }

newTbl, err := txn.Commit(ctx)

ReplaceDataFiles(ctx, filesToDelete, filesToAdd, snapshotProps) and ReplaceDataFilesWithDataFiles(ctx, filesToDelete, dataFilesToAdd, snapshotProps, opts...) are also available on *Transaction for swapping files atomically.

Transactions

Group writes and metadata changes into one atomic snapshot:

txn := tbl.NewTransaction()

if err := txn.Delete(ctx,
    iceberg.LessThan(iceberg.Reference("date"), "2026-01-01"), nil); err != nil {
    /* ... */
}
if err := txn.Append(ctx, recordReader, nil); err != nil {
    /* ... */
}
if err := txn.SetProperties(iceberg.Properties{"commit.user": "data-pipeline"}); err != nil {
    /* ... */
}

newTbl, err := txn.Commit(ctx)

To target a specific branch:

txn := tbl.NewTransactionOnBranch("staging")

Commit retries automatically on conflict (ErrCommitFailed) - tune via the commit.retry.* table properties.

Schema and partition evolution

Schema evolution

import "github.com/apache/iceberg-go/table"

txn := tbl.NewTransaction()

err := table.NewUpdateSchema(txn, true /* caseSensitive */, false /* allowIncompatibleChanges */).
    AddColumn([]string{"tip"}, iceberg.PrimitiveTypes.Float64, "Tip in dollars", false, nil).
    RenameColumn([]string{"name"}, "full_name").
    DeleteColumn([]string{"deprecated_field"}).
    Commit()
if err != nil { /* ... */ }

newTbl, err := txn.Commit(ctx)

Reorder fields with MoveFirst, MoveBefore, or MoveAfter. Set allowIncompatibleChanges to true to permit type narrowing or making optional columns required.

Partition evolution

us := table.NewUpdateSpec(txn, true /* caseSensitive */)
us.AddField("event_time", iceberg.DayTransform{}, "event_day") // sourceColName, transform, partitionFieldName
us.RemoveField("legacy_partition")
if err := us.Commit(); err != nil { /* ... */ }

AddField chains; AddIdentity(sourceCol) is a shortcut for an identity transform; RenameField(name, newName) renames an existing partition field.

Available transforms (root iceberg package): IdentityTransform{}, YearTransform{}, MonthTransform{}, DayTransform{}, HourTransform{}, BucketTransform{NumBuckets: N}, TruncateTransform{Width: W}.

Snapshots and refs

Inspecting

if snap := tbl.CurrentSnapshot(); snap != nil {
    fmt.Println(snap.SnapshotID, snap.TimestampMs, snap.Summary)
}
snap := tbl.SnapshotByID(snapshotID)
named := tbl.SnapshotByName("audit")

for _, s := range tbl.Metadata().Snapshots() {
    fmt.Println(s.SnapshotID, s.Summary)
}

Branches and tags

The CLI's branch create and tag create commands (CLI) are the most ergonomic surface today. Programmatically, ref creation goes through Catalog.CommitTable with a SetSnapshotRef update:

import "github.com/apache/iceberg-go/table"

snap := tbl.CurrentSnapshot()
update := table.NewSetSnapshotRefUpdate(
    "audit",                  // ref name
    snap.SnapshotID,
    table.BranchRef,          // or table.TagRef
    0,                        // maxRefAgeMs (0 = unset)
    0,                        // maxSnapshotAgeMs (0 = unset)
    0,                        // minSnapshotsToKeep (0 = unset)
)
reqs := []table.Requirement{
    table.AssertTableUUID(tbl.Metadata().TableUUID()),
    table.AssertRefSnapshotID("audit", nil), // ref must not already exist
}

_, _, err := cat.CommitTable(ctx, tbl.Identifier(), reqs, []table.Update{update})

Constants table.MainBranch, table.BranchRef, table.TagRef live in table/refs.go. A higher-level builder is on the roadmap.

Expiration and rollback

// Expire snapshots older than the table's retention properties
err := txn.ExpireSnapshots(/* options */)

// Roll back to a previous snapshot
err = txn.RollbackToSnapshot(targetSnapshotID)

Tune retention with the min-snapshots-to-keep, max-snapshot-age-ms, and max-ref-age-ms table properties (see Configuration).

Maintenance

Orphan file cleanup

import (
    "time"

    "github.com/apache/iceberg-go/table"
)

result, err := tbl.DeleteOrphanFiles(ctx,
    table.WithFilesOlderThan(72*time.Hour),
    table.WithDryRun(false),
    table.WithMaxConcurrency(8),
)
if err != nil { /* ... */ }
fmt.Printf("removed %d files\n", len(result.DeletedFiles))

Also see table.WithLocation, table.WithDeleteFunc, table.WithPrefixMismatchMode, table.WithEqualSchemes, and table.WithEqualAuthorities in table/orphan_cleanup.go.

Compaction (rewrite data files)

import "github.com/apache/iceberg-go/table"

txn := tbl.NewTransaction()
result, err := txn.RewriteDataFiles(ctx, groups /* []table.CompactionTaskGroup */, table.RewriteDataFilesOptions{})
if err != nil { /* ... */ }

newTbl, err := txn.Commit(ctx)
fmt.Printf("rewrote %d files into %d (%d -> %d bytes)\n",
    result.RemovedDataFiles, result.AddedDataFiles, result.BytesBefore, result.BytesAfter)

The table/compaction subpackage provides bin-packing planning. The iceberg compact analyze and compact run CLI commands wrap the same machinery - see CLI.

Expiring snapshots

import (
    "time"

    "github.com/apache/iceberg-go/table"
)

txn := tbl.NewTransaction()
err := txn.ExpireSnapshots(
    table.WithOlderThan(7*24*time.Hour),
    table.WithRetainLast(10),
)
if err != nil { /* ... */ }

newTbl, err := txn.Commit(ctx)

Pass table.WithPostCommit(true) to delete the unreferenced data and metadata files after the commit lands. The iceberg expire-snapshots CLI command wraps the same operation - see CLI.

Views

Views are created and loaded through catalogs that support them (REST, Hive, SQL):

import "github.com/apache/iceberg-go/view"

// Create
v, err := view.CreateView(
    ctx,
    "my-catalog",
    table.Identifier{"analytics", "monthly_orders"},
    schema,
    "SELECT month, sum(amount) FROM orders GROUP BY month",
    table.Identifier{"sales"},                      // default namespace for unqualified names
    "s3://my-bucket/views/monthly_orders",
    iceberg.Properties{},
)

// Inspect
v.CurrentVersion()      // *view.Version
v.CurrentSchema()       // *iceberg.Schema
v.Versions()            // []*view.Version
v.Schemas()             // map[int]*iceberg.Schema
v.Properties()          // iceberg.Properties

view.New(ident, meta, metadataLocation) constructs a view from already-loaded metadata; view.NewFromLocation(ctx, ident, metadataLocation, fsysFactory) loads metadata from disk or object storage.

Iceberg ↔ Arrow types

When iceberg-go converts an Iceberg schema to Arrow (e.g. for the scanner output) or vice versa, the type mapping is:

Iceberg typeArrow type
booleanarrow.FixedWidthTypes.Boolean
intarrow.PrimitiveTypes.Int32
longarrow.PrimitiveTypes.Int64
floatarrow.PrimitiveTypes.Float32
doublearrow.PrimitiveTypes.Float64
decimal(p, s)arrow.Decimal128Type{Precision: p, Scale: s}
datearrow.FixedWidthTypes.Date32
timearrow.FixedWidthTypes.Time64us
timestamp&arrow.TimestampType{Unit: arrow.Microsecond} (no zone)
timestamptzarrow.FixedWidthTypes.Timestamp_us (UTC zone)
timestamp_ns&arrow.TimestampType{Unit: arrow.Nanosecond} (no zone)
timestamptz_nsarrow.FixedWidthTypes.Timestamp_ns (UTC zone)
stringarrow.BinaryTypes.String
binaryarrow.BinaryTypes.Binary
fixed[L]&arrow.FixedSizeBinaryType{ByteWidth: L}
uuidarrow.FixedWidthTypes.UUID (extension type)
struct<...>arrow.StructOf(...)
list<E>arrow.ListOf(E) (or LargeListOf if useLargeTypes)
map<K, V>arrow.MapOf(K, V)
variantarrow.ExtensionType for Variant

Helpers in table/arrow_utils.go:

  • SchemaToArrowSchema(sc *iceberg.Schema, nameMapping NameMapping, useLargeTypes, includeRowLineage bool) (*arrow.Schema, error)
  • VisitArrowSchema[T](sc *arrow.Schema, visitor ArrowSchemaVisitor[T]) (T, error)

For a writer-side schema (Arrow → Iceberg), the scanner and writers handle conversion automatically as long as your Arrow schema is compatible with the table schema.

Row Filter Syntax

Row filters drive predicate pushdown during scans, partition pruning, and row-level deletes. The DSL lives in the root iceberg package - all you need is the import:

import "github.com/apache/iceberg-go"

A column is referenced with iceberg.Reference("column_name"). Predicate constructors return a BooleanExpression (or an UnboundPredicate, which satisfies BooleanExpression) that can be combined with the boolean combinators in the Expression DSL and passed to APIs like table.WithRowFilter(...).

Equality

iceberg.EqualTo(iceberg.Reference("status"), "active")
iceberg.NotEqualTo(iceberg.Reference("retries"), int32(0))

EqualTo[T] and NotEqualTo[T] are generic over LiteralType (bool, int32, int64, float32, float64, string, []byte, plus a few iceberg-specific types). The value's type must be in that set - bare int literals are not, so use int32(0) / int64(0) explicitly. They wrap LiteralPredicate(OpEQ, ...) and LiteralPredicate(OpNEQ, ...) (predicates.go:83-91).

Comparison

iceberg.LessThan(iceberg.Reference("amount"), 100.0)
iceberg.LessThanEqual(iceberg.Reference("amount"), 100.0)
iceberg.GreaterThan(iceberg.Reference("created_at"), int64(1700000000))
iceberg.GreaterThanEqual(iceberg.Reference("score"), int32(50))

Operators: OpLT, OpLTEQ, OpGT, OpGTEQ (exprs.go:47-50). Constructors at predicates.go:98-124.

Set membership

iceberg.IsIn(iceberg.Reference("region"), "us-east", "us-west", "eu-west")
iceberg.NotIn(iceberg.Reference("status"), "deleted", "archived")

IsIn and NotIn are variadic. They return a BooleanExpression (not UnboundPredicate) because the result can simplify automatically:

  • Zero values - reduces to AlwaysFalse{} (for IsIn) or AlwaysTrue{} (for NotIn).
  • One value - reduces to EqualTo / NotEqualTo.

See predicates.go:55-78.

Null checks

iceberg.IsNull(iceberg.Reference("deleted_at"))
iceberg.NotNull(iceberg.Reference("user_id"))

These wrap UnaryPredicate(OpIsNull, ...) and UnaryPredicate(OpNotNull, ...). Both panic if the term is nil (predicates.go:23-32).

NaN checks (float / double columns only)

iceberg.IsNaN(iceberg.Reference("ratio"))
iceberg.NotNaN(iceberg.Reference("ratio"))

Operators OpIsNan and OpNotNan. Use these instead of EqualTo(..., math.NaN()) - NaN is never equal to itself.

String prefix

iceberg.StartsWith(iceberg.Reference("path"), "/var/log/")
iceberg.NotStartsWith(iceberg.Reference("name"), "tmp_")

Operators OpStartsWith and OpNotStartsWith (exprs.go:53-54). The value must be a string.

Constants

iceberg.AlwaysTrue{}
iceberg.AlwaysFalse{}

These satisfy BooleanExpression and short-circuit during expression simplification. Useful as a base case when filters are built dynamically:

filter := iceberg.BooleanExpression(iceberg.AlwaysTrue{})
for _, clause := range userClauses {
    filter = iceberg.NewAnd(filter, clause)
}

Operator reference

The full operator set is the Operation enum at exprs.go:34-62:

OperatorConstantConvenience builder
<OpLTLessThan
<=OpLTEQLessThanEqual
>OpGTGreaterThan
>=OpGTEQGreaterThanEqual
==OpEQEqualTo
!=OpNEQNotEqualTo
IS NULLOpIsNullIsNull
IS NOT NULLOpNotNullNotNull
IS NaNOpIsNanIsNaN
IS NOT NaNOpNotNanNotNaN
INOpInIsIn
NOT INOpNotInNotIn
STARTS WITHOpStartsWithStartsWith
NOT STARTS WITHOpNotStartsWithNotStartsWith
AND / OR / NOTOpAnd / OpOr / OpNotNewAnd / NewOr / NewNot (see Expression DSL)

Putting it together

A typical filter passed to a scan:

filter := iceberg.NewAnd(
    iceberg.GreaterThanEqual(iceberg.Reference("event_time"), int64(1700000000)),
    iceberg.IsIn(iceberg.Reference("region"), "us-east", "us-west"),
    iceberg.NotNull(iceberg.Reference("user_id")),
)

scan := tbl.Scan(table.WithRowFilter(filter))

For boolean combination, term details, and the lower-level escape hatches (UnaryPredicate, LiteralPredicate, SetPredicate), see Expression DSL.

Expression DSL

This page covers the building blocks behind the row-filter shortcuts in Row Filter Syntax: boolean combinators, terms, and the lower-level predicate constructors.

The DSL lives entirely in the root iceberg package (see exprs.go and predicates.go).

Boolean combinators

iceberg.NewAnd(a, b)              // a AND b
iceberg.NewAnd(a, b, c, d)        // a AND b AND c AND d (variadic)
iceberg.NewOr(a, b)               // a OR b
iceberg.NewOr(a, b, c, d)         // a OR b OR c OR d
iceberg.NewNot(a)                 // NOT a

NewAnd and NewOr accept two required arguments plus a variadic tail (exprs.go:226, exprs.go:287). They simplify automatically:

  • NewAnd(x, AlwaysTrue{}) reduces to x.
  • NewAnd(x, AlwaysFalse{}) reduces to AlwaysFalse{}.
  • NewOr(x, AlwaysFalse{}) reduces to x.
  • NewOr(x, AlwaysTrue{}) reduces to AlwaysTrue{}.
  • NewNot(NewNot(x)) reduces to x.

Constants

iceberg.AlwaysTrue{}
iceberg.AlwaysFalse{}

Both satisfy BooleanExpression. Use them as the identity element when composing filters dynamically.

Terms

A term is the left-hand side of a predicate. Iceberg-go has two flavors:

  • Reference("column_name") - an unbound term that names a column (exprs.go:373). Typing happens at bind time, when the expression is matched against a schema. This is what you almost always want.
  • BoundReference - the resolved form, produced by Reference.Bind(schema, caseSensitive) (exprs.go:389). You only encounter these when writing custom expression visitors.

The interfaces are:

type Term interface { ... }                        // shared marker
type UnboundTerm interface { Term; ... }           // pre-bind
type BoundTerm interface { Term; Ref() BoundReference; ... }

(exprs.go:317-348)

Predicates

A predicate applies an Operation to one or more terms. BooleanExpression is the shared interface (exprs.go:123).

For all the common shapes, the constructors in predicates.go are the right tool. See Row Filter Syntax for the full list (EqualTo, LessThan, IsIn, IsNull, StartsWith, etc.).

Lower-level escape hatches

When the convenience builders are not enough (custom operators, dynamic operation selection, working with already-typed Literal values), use the predicate constructors directly:

// Unary predicates: IS NULL / NOT NULL / IS NaN / NOT NaN
pred := iceberg.UnaryPredicate(iceberg.OpIsNull, iceberg.Reference("col"))

// Literal predicates: <, <=, >, >=, ==, !=, STARTS WITH, NOT STARTS WITH
lit := iceberg.NewLiteral(int64(42))
pred := iceberg.LiteralPredicate(iceberg.OpEQ, iceberg.Reference("col"), lit)

// Set predicates: IN / NOT IN
lits := []iceberg.Literal{iceberg.NewLiteral("a"), iceberg.NewLiteral("b")}
pred := iceberg.SetPredicate(iceberg.OpIn, iceberg.Reference("col"), lits)

UnaryPredicate lives at exprs.go:534. LiteralPredicate and SetPredicate live in the same file.

Negation

Every BooleanExpression and every Operation knows how to negate itself.

op := iceberg.OpEQ
op.Negate()  // -> OpNEQ

(exprs.go:65-98. OpNot, OpAnd, and OpOr panic on direct negation - negate the wrapping expression instead.)

expr := iceberg.EqualTo(iceberg.Reference("status"), "active")
inverted := expr.Negate()  // equivalent to NotEqualTo(...)

Binding and evaluation

Most user code stops at constructing the unbound expression - the scan pipeline handles binding, projection, and evaluation internally. If you are writing a custom visitor:

  • (Reference).Bind(schema, caseSensitive) returns a BoundTerm (exprs.go:389).
  • BoundExpressions expose Ref(), Type(), and (for terms) the underlying accessor for evaluating against a StructLike row.

For projection, evaluation, and visitor patterns, see visitors.go and the scan internals in table/scanner.go.

When to reach for what

GoalUse
Filter a scan or row-level deleteConvenience builders + NewAnd/NewOr/NewNot
Combine many clauses dynamicallyStart from AlwaysTrue{} (for AND) or AlwaysFalse{} (for OR), fold with NewAnd/NewOr
Construct a predicate whose operator is chosen at runtimeUnaryPredicate(op, term), LiteralPredicate(op, term, lit), SetPredicate(op, term, lits)
Walk an expression treeA custom BooleanExprVisitor from visitors.go

For the per-operator cookbook, return to Row Filter Syntax.

Concurrent Writes

When multiple writers commit to the same table, iceberg-go uses optimistic concurrency control: every commit is validated against the table state it was built on. If another writer committed first, the commit is rejected rather than silently clobbering their snapshot.

Conflict detection

Each producer (Append, Overwrite, Delete, RowDelta, RewriteDataFiles) runs a set of validators before the commit lands. They check that the files the operation assumed (added, deleted, or filtered) still match the current table state. A failed validation surfaces as one of:

  • table.ErrCommitFailed — the base snapshot moved on; the commit can be retried after refreshing (see below).
  • table.ErrCommitDiverged — terminal. The base snapshot is no longer on the branch at all, so a retry cannot reconcile the change.
import (
    "context"
    "errors"

    "github.com/apache/iceberg-go/table"
)

_, err := txn.Commit(context.Background())
switch {
case errors.Is(err, table.ErrCommitDiverged):
    // unrecoverable: rebuild the operation from the latest table
case errors.Is(err, table.ErrCommitFailed):
    // retriable: refresh and try again (the retry loop does this for you)
}

Isolation levels

Delete and update operations are validated under an isolation level, set per operation through table properties:

PropertyDefaultValues
write.delete.isolation-levelserializableserializable, snapshot
write.update.isolation-levelserializableserializable, snapshot

serializable rejects the commit if any concurrent snapshot added data matching the operation's filter. snapshot is more permissive — it only rejects when concurrent deletes touch the same files.

Automatic retry

iceberg-go can refresh the table and replay the operation against the latest snapshot between attempts. On each retry the table metadata is reloaded, the producer's validators re-run against the fresh snapshot, and the commit is re-submitted only if it is still valid.

Retry is off by default (commit.retry.num-retries is 0). Opt in by setting the retry properties — see Manifest and commit in the configuration reference for the full list:

// Opt in to retries (e.g. 4 attempts after the first) for a contended table.
_, err := cat.CreateTable(ctx, ident, schema,
    catalog.WithProperties(iceberg.Properties{
        "commit.retry.num-retries": "4",
    }),
)

Catalog support: retry currently engages on the REST catalog, which wraps commit conflicts as ErrCommitFailed. The Glue, SQL, and Hive catalogs do not yet wrap their conflict errors, so the retry loop will not fire on them.

Feature Status

This page tracks what Apache Iceberg Go currently supports. The matrix is kept in sync with the project README.md; if you spot a discrepancy, file an issue.

Spec format version coverage

Apache Iceberg Go reads and writes table format versions 1, 2, and 3. The maximum supported version is enforced at table/metadata.go (supportedTableFormatVersion = 3). For active tracking, see issues #589 (V3) and #829 (V2 completion).

V1

All V1 features are supported. V1 is the format-version baseline.

V2

FeatureStatus
Sequence numbersSupported
Manifest entry status (added / existing / deleted)Supported
Positional deletesSupported (read + write)
Equality deletesSupported (read + write). Write via Transaction.WriteEqualityDeletes; row-level commits via Transaction.NewRowDelta
Partition spec evolutionSupported
Sort order enforcement on writeSupported (PR #1157, closes #833)
ReplaceDataFiles using OpReplacePending (#841)

V3

FeatureStatus
Nanosecond timestamps (timestamp_ns, timestamptz_ns)Supported
Default values (initial-default, write-default)Supported
Row lineage (_row_id, _last_updated_sequence_number)Supported
Encryption keys in metadataSupported
Variant type, non-shreddedSupported (PR #932; umbrella #929)
Variant type, shredded reader / writerIn progress (#986, #987)
Deletion vectors, readSupported
Deletion vectors, write (unpartitioned)Supported
Deletion vectors, write (partitioned)In progress (#1135, PR #1151)
Geometry / Geography types (schema)Supported
Geometry / Geography (transforms, statistics, pruning)In progress (umbrella #989)
Multi-argument transformsInfrastructure present; no concrete implementations exercised yet

FileSystem support

Filesystem TypeSupported
S3X
Google Cloud StorageX
Azure Blob StorageX
Local FilesystemX

S3, GCS, and Azure require a blank import: _ "github.com/apache/iceberg-go/io/gocloud". See Configuration.

Metadata operations

OperationSupported
Get SchemaX
Get SnapshotsX
Get Sort OrdersX
Get Partition SpecsX
Get ManifestsX
Create New ManifestsX
Plan ScanX
Plan Scan for SnapshotX

Catalog support

OperationRESTHiveGlueSQLHadoop
Load TableXXXXX
List TablesXXXXX
Create TableXXXXX
Register TableXXX
Update Current SnapshotXXXXX
Create New SnapshotXXXXX
Rename TableXXXX
Drop TableXXXXX
Alter TableXXXXX
Check Table ExistsXXXXX
Set Table PropertiesXXXXX
List NamespacesXXXXX
Create NamespaceXXXXX
Check Namespace ExistsXXXXX
Drop NamespaceXXXXX
Update Namespace PropertiesXXXX
Create ViewXXX
Load ViewXX
List ViewXXX
Drop ViewXXX
Check View ExistsXXX

A Hadoop catalog is also available - see catalog/hadoop.

Read / write data

Data can be read as an Arrow Table or as a stream of Arrow record batches via iter.Seq2. See API Reference.

Supported write operations

As long as the FileSystem is supported and the Catalog supports altering the table:

OperationSupported
Append StreamX
Append Data FilesX
Rewrite FilesX
Rewrite manifests
Overwrite FilesX
Copy-On-Write DeleteX
Write Pos DeleteX
Write Eq DeleteX
Row DeltaX

Contributing

Get in Touch

Picking Up Issues

Before starting work on an issue:

  1. Check for existing PRs. Search the open pull requests to make sure nobody is already working on it.
  2. Claim the issue. Leave a comment on the issue (e.g., "I'd like to work on this") and wait for a maintainer to acknowledge before writing code.
  3. One at a time for new contributors. If you haven't had a PR merged into iceberg-go yet, please work on one issue at a time. Get it reviewed, address feedback, get it merged — then pick up the next one. This helps us give your work the attention it deserves and avoids wasted effort from overlapping contributions.

If two PRs land for the same issue, we will generally keep the one from the contributor who claimed it first.

Submitting a Pull Request

  • Reference the issue number in your PR description (e.g., "Fixes #123").
  • Keep PRs focused — one issue per PR.
  • Run go test ./..., gofmt, and golangci-lint run before pushing. CI runs all of these too, but catching issues locally saves a round-trip.
  • All commits must have a Signed-off-by line (DCO).

Code Review

  • Maintainers may request changes. This is normal — it doesn't mean the PR is bad, it means we want to get it right.
  • Respond to review comments by pushing new commits (don't force-push over reviewed code).
  • If your PR has been waiting for review for more than a few days, ping on Slack.

Development Setup

git clone https://github.com/apache/iceberg-go.git
cd iceberg-go
go build ./...
go test ./...

Integration Tests

Integration tests require Docker and are gated behind a build tag:

docker compose -f internal/recipe/docker-compose.yml up -d rest minio mc --wait
go test -tags integration ./...

Community

Apache Iceberg Go is developed in the open as part of the broader Apache Iceberg project. The fastest ways to ask questions, share work, and follow development are below.

Chat

Mailing list

The Apache Iceberg project uses a single dev mailing list across all language implementations.

  • Read / post: dev@iceberg.apache.org
  • Subscribe: send any email to dev-subscribe@iceberg.apache.org

Issues and pull requests

For contribution guidelines, see Contributing.

Apache Iceberg community at large

Glossary

This glossary defines important terms used throughout the Iceberg ecosystem, organized in tables for easy reference.

Core Concepts

TermDefinition
CatalogA centralized service that manages table metadata and provides a unified interface for accessing Iceberg tables. Catalogs can be implemented as Hive metastore, AWS Glue, REST API, or SQL-based solutions.
TableA collection of data files organized by a schema, with metadata tracking changes over time through snapshots. Tables support ACID transactions and schema evolution.
SchemaThe structure definition of a table, specifying field names, types, and whether fields are required or optional. Schemas are versioned and can evolve over time.
SnapshotA point-in-time view of a table's data, representing the state after a specific operation (append, overwrite, delete, etc.). Each snapshot contains metadata about the operation and references to data files.
ManifestA metadata file that lists data files and their metadata (location, partition information, record counts, etc.). Manifests are organized into manifest lists for efficient access.
Manifest ListA file that contains references to manifest files for a specific snapshot, enabling efficient discovery of data files without reading all manifests.

Data Types

Primitive Types

TypeDescription
booleanTrue/false values
int (32-bit)Integer values
long (64-bit)Long integer values
float (32-bit)Single precision floating point
double (64-bit)Double precision floating point
dateDate values (days since epoch)
timeTime values (microseconds since midnight)
timestampTimestamp values (microseconds since epoch)
timestamptzTimestamp with timezone
stringUTF-8 encoded strings
uuidUUID values
binaryVariable length binary data
fixed[n]Fixed length binary data of n bytes
decimal(p,s)Decimal values with precision p and scale s

Nested Types

TypeDescription
structCollection of named fields
listOrdered collection of elements
mapKey-value pairs

Operations

OperationDescription
AppendAn operation that adds new data files to a table without removing existing data. Creates a new snapshot with the additional files.
OverwriteAn operation that replaces existing data files with new ones, typically based on a partition predicate. Creates a new snapshot with the replacement files.
DeleteAn operation that removes data files from a table, either by marking them as deleted or by removing references to them.
ReplaceAn operation that completely replaces all data in a table with new data, typically used for full table refreshes.

Partitioning

TermDefinition
PartitionA logical division of table data based on column values, used to improve query performance by allowing selective reading of relevant data files.
Partition SpecDefines how table data is partitioned by specifying source columns and transformations (identity, bucket, truncate, year, month, day, hour).
Partition FieldA field in the partition spec that defines how a source column is transformed for partitioning.
Partition PathThe file system path structure created by partition values, typically in the format partition_name=value/.

Partition Transforms

TransformDescription
identityUse the column value directly
bucket[n]Hash the value into n buckets
truncate[n]Truncate strings to n characters
yearExtract year from date/timestamp
monthExtract month from date/timestamp
dayExtract day from date/timestamp
hourExtract hour from timestamp
voidAlways returns null (used for unpartitioned tables)

Expressions and Predicates

TermDefinition
ExpressionA computation or comparison that can be evaluated against table data, used for filtering and transformations.
PredicateA boolean expression used to filter data, such as column comparisons, null checks, or set membership tests.
Bound PredicateA predicate that has been resolved against a specific schema, with field references bound to actual columns.
Unbound PredicateA predicate that contains unresolved field references, typically in string form before binding to a schema.
LiteralA constant value used in expressions and predicates, such as numbers, strings, dates, etc.

File Formats

FormatUsageDescription
ParquetData filesThe primary data file format used by Iceberg, providing columnar storage with compression and encoding optimizations.
AvroMetadata filesUsed for manifests and manifest lists due to its schema evolution capabilities and compact binary format.
ORCData filesAn alternative columnar format supported by some Iceberg implementations.

Metadata

TermDefinition
Metadata FileA JSON file containing table metadata including schema, partition spec, properties, and snapshot information.
Metadata LocationThe URI pointing to the current metadata file for a table, stored in the catalog.
PropertiesKey-value pairs that configure table behavior, such as compression settings, write options, and custom metadata.
StatisticsMetadata about data files including record counts, file sizes, and value ranges for optimization.

Transactions

TermDefinition
TransactionA sequence of operations that are committed atomically, ensuring data consistency and ACID properties.
CommitThe process of finalizing a transaction by creating a new snapshot and updating the metadata file.
RollbackThe process of undoing changes in a transaction, typically by reverting to a previous snapshot.

References

TermDefinition
BranchA named reference to a specific snapshot, allowing multiple concurrent views of table data.
TagAn immutable reference to a specific snapshot, typically used for versioning and releases.

Storage

TermDefinition
WarehouseThe root directory or bucket where table data and metadata are stored.
Location ProviderA component that generates file paths for table data and metadata based on table location and naming conventions.
FileIOAn abstraction layer for reading and writing files across different storage systems (local filesystem, S3, GCS, Azure Blob, etc.).

Query Optimization

TechniqueDescription
Column PruningA technique that reads only the columns needed for a query, reducing I/O and improving performance.
Partition PruningA technique that skips reading data files from irrelevant partitions based on query predicates.
Predicate PushdownA technique that applies filtering predicates at the storage layer, reducing data transfer and processing.
Statistics-based OptimizationUsing table and file statistics to optimize query execution plans and file selection.

Schema Evolution

TermDefinition
Schema EvolutionThe process of modifying a table's schema over time while maintaining backward compatibility.
Column AdditionAdding new columns to a table schema, which are typically optional to maintain compatibility.
Column DeletionRemoving columns from a table schema, which may be logical (marking as deleted) or physical.
Column RenamingChanging column names while preserving data and type information.
Type EvolutionChanging column types in ways that maintain data compatibility (e.g., int32 to int64).

Time Travel

TermDefinition
Time TravelThe ability to query a table as it existed at a specific point in time using snapshot timestamps.
Snapshot IsolationA property that ensures queries see a consistent view of data as it existed at a specific snapshot.

ACID Properties

PropertyDescription
AtomicityEnsures that all operations in a transaction either succeed completely or fail completely.
ConsistencyEnsures that the table remains in a valid state after each transaction.
IsolationEnsures that concurrent transactions do not interfere with each other.
DurabilityEnsures that committed changes are permanently stored and survive system failures.

Releases

Apache Iceberg Go follows the standard Apache release process. Releases are cut from main, voted on by the Apache Iceberg PMC, and published as signed source tarballs to https://downloads.apache.org/iceberg/ and as Go module versions tagged vX.Y.Z.

Latest release

Using a release

Pin a tagged version directly with Go modules:

go get github.com/apache/iceberg-go@vX.Y.Z

To track main, use @main instead of a version tag.

Release notes

Per-release notes (highlights, breaking changes, contributors) are published on the GitHub Releases page. Iceberg Go does not maintain a curated CHANGELOG.md in the repository; the GitHub Releases page is the canonical source.

Verifying and producing releases

If you are validating an RC or cutting a new release, see:

  • Verify a release - what to do when a [VOTE] thread is posted on dev@iceberg.apache.org.
  • How to release - PMC/committer process for cutting an RC and publishing.

Verify a release

When a release candidate is announced on dev@iceberg.apache.org, anyone (committer or not) can help by verifying the artifacts. Verification is required from at least three Apache Iceberg PMC members for the vote to pass.

The repository ships a script that performs the full verification end-to-end: dev/release/verify_rc.sh.

What an RC announcement contains

A [VOTE] iceberg-go X.Y.Z RC<N> thread on dev@iceberg.apache.org will reference:

  • A signed source tarball under https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-go-${VERSION}-rc${RC}/ (.tar.gz, .tar.gz.asc, .tar.gz.sha512)
  • The Apache Iceberg KEYS file at https://downloads.apache.org/iceberg/KEYS
  • A GitHub compare URL between the previous release tag and the new RC tag

Prerequisites

The script requires:

  • curl
  • gpg
  • shasum or sha512sum
  • tar

You do not need Go installed - if Go is not on the system, the latest Go is downloaded automatically and used only for verification.

Import the Apache Iceberg KEYS

curl https://downloads.apache.org/iceberg/KEYS -o KEYS
gpg --import KEYS

Run the verification script

The script takes the version and RC number as positional arguments:

dev/release/verify_rc.sh ${VERSION} ${RC}

For example, to verify 0.6.0 RC1:

dev/release/verify_rc.sh 0.6.0 1

If the verification succeeds, the script prints:

RC looks good!

Optional environment variables

verify_rc.sh honors these environment variables (all optional):

VariableDefaultEffect
VERIFY_DEFAULT1Master switch propagated to VERIFY_DOWNLOAD and VERIFY_SIGN if they are unset.
VERIFY_DOWNLOAD${VERIFY_DEFAULT}Re-download artifacts when 1; reuse the local copy when 0.
VERIFY_SIGN${VERIFY_DEFAULT}Re-run signature and checksum verification when 1.
VERIFY_FORCE_USE_GO_BINARY0When 1, ignore any system Go and use the script's auto-downloaded Go.
GITHUB_TOKENunsetOptional - supplies authenticated requests when fetching the latest Go release, avoiding rate limits.

Manual verification fallback

If you would rather verify by hand, the underlying steps are:

# Download the artifacts
curl -O https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-go-${VERSION}-rc${RC}/apache-iceberg-go-${VERSION}.tar.gz
curl -O https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-go-${VERSION}-rc${RC}/apache-iceberg-go-${VERSION}.tar.gz.asc
curl -O https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-go-${VERSION}-rc${RC}/apache-iceberg-go-${VERSION}.tar.gz.sha512

# Verify the signature
gpg --verify apache-iceberg-go-${VERSION}.tar.gz.asc apache-iceberg-go-${VERSION}.tar.gz

# Verify the checksum (or sha512sum -c on systems without shasum)
shasum -a 512 --check apache-iceberg-go-${VERSION}.tar.gz.sha512

# Inspect the contents
tar xf apache-iceberg-go-${VERSION}.tar.gz

You should reply to the [VOTE] thread with +1, +0, or -1 and a one-line description of what you verified (for example, "verified signatures, checksums, and verify_rc.sh passed on macOS arm64 with system Go 1.25.5").

How to release

This page is for Apache Iceberg PMC members and committers who are cutting a new Apache Iceberg Go release. The canonical scripts live in dev/release/ and dev/release/README.md is the source of truth - this page mirrors it and adds context.

Requirements

  • You must be an Apache Iceberg committer or PMC member.

  • You must prepare a PGP key for signing. See https://infra.apache.org/release-signing.html#generate.

  • Your PGP key must be registered in the Apache Iceberg KEYS file at https://downloads.apache.org/iceberg/KEYS. To add a key:

    svn co https://dist.apache.org/repos/dist/release/iceberg
    cd iceberg
    $EDITOR KEYS
    svn ci KEYS
    
  • You must run the release scripts from a working copy whose origin remote is git@github.com:apache/iceberg-go.git (not your fork). release_rc.sh enforces this.

Overview

  1. Test the revision to be released.
  2. Prepare an RC and start a vote.
  3. On a passing vote, publish.

Prepare an RC and vote

Run dev/release/release_rc.sh against the canonical clone:

git clone git@github.com:apache/iceberg-go.git
cd iceberg-go
GH_TOKEN=${YOUR_GITHUB_TOKEN} dev/release/release_rc.sh ${VERSION} ${RC}

Example for 0.6.0 RC1:

GH_TOKEN=${YOUR_GITHUB_TOKEN} dev/release/release_rc.sh 0.6.0 1

The arguments are the version and the RC number. If RC1 has a problem, increment the RC number to RC2, RC3, and so on.

release_rc.sh will:

  • Tag vX.Y.Z-rc<N> and push the tag.
  • Create a signed source tarball.
  • Upload the artifacts to https://dist.apache.org/repos/dist/dev/iceberg/.
  • Print a draft [VOTE] email you can use on dev@iceberg.apache.org.

Send the [VOTE] email. The vote runs for at least 72 hours and requires three +1 votes from PMC members with no -1 votes to pass.

Publish

When the vote passes, run:

GH_TOKEN=${YOUR_GITHUB_TOKEN} dev/release/release.sh ${VERSION} ${RC}

release.sh moves the artifacts from https://dist.apache.org/repos/dist/dev/iceberg/ to https://dist.apache.org/repos/dist/release/iceberg/ (which feeds https://downloads.apache.org/iceberg/) and creates a GitHub Release with auto-generated notes.

Post-release tasks

After publishing, complete these steps:

  1. Add the release to ASF's report database at the Apache Committee Report Helper.
  2. Verify the GitHub Release at https://github.com/apache/iceberg-go/releases is correctly tagged, has generated release notes against the prior tag, and is marked as latest. release.sh runs gh release create ... --generate-notes --verify-tag so the release should already exist; double-check the notes and the "Latest" badge.
  3. Send the [ANNOUNCE] email to dev@iceberg.apache.org and announce@apache.org.
  4. File a release blog post in apache/iceberg under site/docs/blog/posts/. See the prior 0.5.0 post (2026-03-05-iceberg-go-0.5.0-release.md) for the frontmatter and structure.

Patch releases

dev/release/README.md does not document a patch-branch convention (e.g. iceberg-go-0.X.x). Confirm with the PMC on dev@iceberg.apache.org before cutting a patch release.

Other Iceberg implementations

Apache Iceberg Go is one of several official Iceberg implementations. Pick the one that matches your runtime; the table format and catalog protocols are the same across all of them.

ProjectLanguageRepositoryDocumentation
Apache IcebergJava (reference)apache/icebergiceberg.apache.org
PyIcebergPythonapache/iceberg-pythonpy.iceberg.apache.org
iceberg-rustRustapache/iceberg-rustrust.iceberg.apache.org
iceberg-cppC++apache/iceberg-cppcpp.iceberg.apache.org (early stage)

When to use which

  • Java is the reference implementation and is what every query engine integrates against (Spark, Flink, Trino, Hive, Presto, Dremio, etc.). If you are running a JVM workload, this is the canonical choice.
  • PyIceberg is for Python and the dataframe ecosystem (PyArrow, Pandas, Polars, DuckDB, Daft, Ray). Most data-science and ML workflows live here.
  • iceberg-rust is the Rust implementation, used by pyiceberg-core, DataFusion-based engines, and other Rust-native systems.
  • iceberg-cpp is early-stage (0.2.0 released 2026-01-26). Track the project for native C++ integration once it stabilizes.
  • iceberg-go (this project) is for Go services and tooling. Tight Apache Arrow Go integration makes it a good fit for streaming Arrow record batches into and out of Iceberg tables.

Specifications and shared concepts

The Iceberg spec, terminology, partitioning semantics, evolution semantics, REST Catalog OpenAPI, and multi-engine support policy live with the main project at iceberg.apache.org. All the implementations above target the same spec.

For Apache Iceberg Go-specific guidance, continue with the API Reference, CLI, or Configuration.