Configuration

This page documents how to configure Apache Iceberg Go: the CLI's YAML config file, per-catalog Option surfaces, file-system credentials, table write properties, concurrency, and how to plug in custom catalogs and IO backends.

Only properties and options that the iceberg-go code actually reads are listed. Properties defined in the Apache Iceberg spec but not yet wired into iceberg-go are intentionally omitted - check pkg.go.dev/github.com/apache/iceberg-go for the latest read sites.

CLI configuration file

The iceberg CLI loads catalog defaults from ~/.iceberg-go.yaml (override the directory with GOICEBERG_HOME). The schema, defined in config/config.go, is:

default-catalog: default
max-workers: 5
catalog:
  default:
    type: rest
    uri: https://example.com/iceberg
    warehouse: s3://my-bucket/warehouse
    credential: <client-id>:<client-secret>
    output: text
    rest:
      sigv4-enabled: false
      signing-name: ""
      signing-region: ""
KeyPurpose
default-catalogName used when --catalog-name is not passed on the CLI.
max-workersWorker pool size for concurrent operations. Default 5.
catalog.<name>.typeOne of rest, hive, glue, sql, hadoop.
catalog.<name>.uriCatalog endpoint or DSN.
catalog.<name>.warehouseWarehouse identifier (REST/Glue) or location (Hive/SQL).
catalog.<name>.credentialCredential string passed through to the catalog's auth handler.
catalog.<name>.outputCLI output format (e.g. text, json).
catalog.<name>.rest.sigv4-enabledEnable AWS SigV4 signing for REST.
catalog.<name>.rest.signing-nameSigV4 service name.
catalog.<name>.rest.signing-regionSigV4 region.

Catalog options

Each catalog package exposes its own functional Option set. The lists below reflect the public option surface; pkg.go.dev is authoritative for the current set.

REST (catalog/rest)

The most option-rich surface. Source: catalog/rest/options.go.

GroupOptions
AuthenticationWithCredential, WithOAuthToken, WithAuthManager, WithAuthURI, WithScope, WithAudience, WithResource
AWS SigV4WithSigV4, WithSigV4RegionSvc, WithAwsConfig
HTTPWithHeaders, WithTLSConfig, WithOAuthTLSConfig, WithCustomTransport
Catalog routingWithPrefix, WithWarehouseLocation, WithMetadataLocation
Pass-throughWithAdditionalProps

Hive (catalog/hive)

Source: catalog/hive/options.go.

  • WithURI(uri string) - Thrift URI for the Hive Metastore (e.g. thrift://127.0.0.1:9083).
  • WithWarehouse(warehouse string)
  • WithProperties(props iceberg.Properties)

Glue (catalog/glue)

Source: catalog/glue/options.go.

  • WithAwsConfig(cfg aws.Config) - AWS SDK v2 config; respects the AWS default credential chain.
  • WithAwsProperties(props AwsProperties) - explicit overrides for region/endpoint/access keys.

SQL (catalog/sql)

The SQL catalog has no functional-option surface. Construct it with NewCatalog:

db, _ := sql.Open(sqliteshim.ShimName, "file:catalog.db")
cat, err := sqlcat.NewCatalog("default", db, sqlcat.SQLite, iceberg.Properties{
    "warehouse": "file:///tmp/warehouse",
})

Supported dialects: sqlcat.Postgres, sqlcat.MySQL, sqlcat.SQLite, sqlcat.MSSQL, sqlcat.Oracle (catalog/sql/sql.go:50).

Shared options on the base catalog package

Operations that create or update tables/views accept these (catalog/catalog.go):

  • WithLocation, WithPartitionSpec, WithSortOrder, WithProperties, WithStagedUpdates
  • View-specific: WithViewLocation, WithViewProperties

File-system credentials

iceberg-go registers the local file system (file://) automatically. Cloud schemes are not registered until you add a blank import:

import _ "github.com/apache/iceberg-go/io/gocloud"

The init() function in io/gocloud/register.go registers s3, s3a, s3n, oss, gs, abfs, abfss, wasb, and wasbs. Without the blank import, these schemes return ErrIOSchemeNotFound with a hint to add the import.

All credential and tuning property keys are constants in io/config.go. They can be supplied through table properties, catalog properties, or per-call iceberg.Properties arguments depending on context.

S3

Authentication is resolved in this order (io/gocloud/s3.go):

  1. Static credentials in properties: s3.access-key-id + s3.secret-access-key (+ optional s3.session-token).
  2. The standard AWS SDK v2 default credential chain - environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN), ~/.aws/credentials, container/IAM role.

Tuning properties:

Key (constant)Purpose
s3.region (io.S3Region) / client.region (io.S3ClientRegion)AWS region.
s3.endpoint (io.S3EndpointURL)Override S3 endpoint URL (custom or compatible storage). Falls back to AWS_S3_ENDPOINT env.
s3.access-key-id (io.S3AccessKeyID)Static access key.
s3.secret-access-key (io.S3SecretAccessKey)Static secret key.
s3.session-token (io.S3SessionToken)Static session token.
s3.proxy-uri (io.S3ProxyURI)HTTP proxy URL.
s3.connect-timeout (io.S3ConnectTimeout)Either a number of seconds ("60", "60.0") or a Go duration ("5s").
s3.force-virtual-addressing (io.S3ForceVirtualAddressing)Force virtual-host-style addressing.
s3.signer.uri (io.S3SignerURI)Reserved for remote-signing endpoint (not yet implemented).

Google Cloud Storage

Authentication resolution (io/gocloud/gcs.go):

  1. Explicit JSON key bytes via gcs.jsonkey or path via gcs.keypath.
  2. Optional gcs.credtype selecting one of service_account, authorized_user, impersonated_service_account, external_account.
  3. The GCP default credentials chain (gcp.DefaultCredentials) - falls back to anonymous if no creds are found.

Tuning properties:

Key (constant)Purpose
gcs.endpoint (io.GCSEndpoint)Custom GCS endpoint URL.
gcs.keypath (io.GCSKeyPath)Path to a JSON service-account key file.
gcs.jsonkey (io.GCSJSONKey)JSON key as a string.
gcs.credtype (io.GCSCredType)Credential type override.
gcs.usejsonapi (io.GCSUseJSONAPI)Set to any value to enable the GCS JSON API for reads.

Azure Data Lake Storage / Blob

Authentication is selected based on the property keys present (io/gocloud/azure.go):

  1. Shared key: both adls.auth.shared-key.account.name and adls.auth.shared-key.account.key set.
  2. Per-host SAS token: adls.sas-token.<hostname> (prefix-matched against the storage account host).
  3. Per-host connection string: adls.connection-string.<hostname>.
  4. Managed identity: adls.auth.managed-identity.enabled set to a truthy value.

Tuning properties:

Key (constant)Purpose
adls.auth.shared-key.account.name (io.ADLSSharedKeyAccountName)Account name.
adls.auth.shared-key.account.key (io.ADLSSharedKeyAccountKey)Account key.
adls.sas-token.<host> (prefix io.ADLSSasTokenPrefix)Per-host SAS token.
adls.connection-string.<host> (prefix io.ADLSConnectionStringPrefix)Per-host connection string.
adls.client-id (io.ADLSClientID)Client/application ID for AAD auth.
adls.endpoint (io.ADLSEndpoint)Storage domain (e.g. blob.core.windows.net).
adls.protocol (io.ADLSProtocol)http or https.
adls.auth.managed-identity.enabled (io.ADLSManagedIdentityEnabled)Enable Azure Managed Identity auth.

Environment variables

iceberg-go reads only a small set of environment variables directly. AWS / GCP / Azure credentials flow through the respective SDKs, not through iceberg-go-defined env vars.

VariablePurposeRead at
GOICEBERG_HOMEDirectory containing .iceberg-go.yaml. Defaults to the user's home directory.config/config.go:87
ICEBERG_SQL_DEBUGSQL catalog query logging - 1 (failed queries), 2 (all queries).catalog/sql/sql.go:206
AWS_S3_ENDPOINTFallback S3 endpoint when s3.endpoint is unset.io/gocloud/s3.go:193

There is no PYICEBERG_*-style env var convention. Use the YAML config file or pass iceberg.Properties to overrides programmatically.

Concurrency

SettingSourceEffect
max-workers in ~/.iceberg-go.yaml (config.EnvConfig.MaxWorkers)YAML configWorker pool size used by parallel column writes, snapshot producers, scan plan, equality-delete writers. Default 5.
WitMaxConcurrency(n int) ScanOptionCode (table.WitMaxConcurrency)Per-scan override. Note: function name is Wit... (not With...) - this is a pre-existing typo in the public API.
WithMaxWriteWorkers(n int)Code (per-write API on WriteRecords)Per-write override of the worker count.
WithClusteredWrite()Code (per-write API on WriteRecords)Forces single-threaded writes. Mutually exclusive with WithMaxWriteWorkers.

Pluggability

Two registries are user-extensible. The third (LocationProvider) is currently informational.

IO scheme registry

Register a custom URL scheme with io.Register:

import (
    "context"
    "net/url"

    "github.com/apache/iceberg-go/io"
)

func init() {
    io.Register("myfs", func(ctx context.Context, parsed *url.URL, props map[string]string) (io.IO, error) {
        return newMyFS(parsed, props)
    })
}

io.Register panics on nil factory or duplicate scheme. Built-in schemes: file, "" (the empty scheme). Cloud schemes (s3, gs, abfs, etc.) are registered by io/gocloud only when its package is blank-imported.

io.GetRegisteredSchemes() returns the current scheme list; io.Unregister(scheme) removes one.

Catalog type registry

Register a custom catalog type with catalog.Register:

import (
    "context"

    "github.com/apache/iceberg-go"
    "github.com/apache/iceberg-go/catalog"
)

func init() {
    catalog.Register("mycatalog", catalog.RegistrarFunc(
        func(ctx context.Context, name string, props iceberg.Properties) (catalog.Catalog, error) {
            return newMyCatalog(name, props)
        },
    ))
}

After registration, catalog.Load(ctx, "default", iceberg.Properties{"type": "mycatalog", ...}) will route to the factory. Built-in types: rest, hive, glue, sql, hadoop.

catalog.GetRegisteredCatalogs() returns the current list; catalog.Unregister(catalogType) removes one.

LocationProvider

table/locations.go defines the LocationProvider interface and ships two implementations: simpleLocationProvider (default) and objectStoreLocationProvider (selected by write.object-storage.enabled = true). The provider is chosen by table properties and is not user-pluggable today.

Table write properties

Property key constants are in table/properties.go and Parquet keys in table/internal/parquet_files.go. The keys below have verified read sites in non-test code.

Format and file sizing

KeyDefaultDescription
write.format.defaultparquetFile format used when writing data files. Read in table/writer.go and table/rolling_data_writer.go.
write.target-file-size-bytes(set by writer)Target size for newly written data files. Read in table/arrow_utils.go and table/equality_delete_writer.go.

Metrics and metadata lifecycle

KeyDescription
write.metadata.metrics.defaultDefault per-column metrics mode.
write.metadata.metrics.column.<name>Per-column override prefix.
write.metadata.delete-after-commit.enabledWhen true, expire old metadata files after a successful commit.
write.metadata.previous-versions-maxCap on retained metadata files. Default 100.
write.metadata.compression-codecCompression for metadata JSON.

Manifest and commit

KeyDescription
commit.manifest-merge.enabledMerge small manifests during commit.
commit.manifest.target-size-bytesTarget size for merged manifests.
commit.manifest.min-count-to-mergeMinimum manifest count that triggers a merge.
commit.retry.num-retriesRetries for ErrCommitFailed.
commit.retry.min-wait-ms / commit.retry.max-wait-ms / commit.retry.total-timeout-msBackoff bounds.

Snapshot retention

KeyDescription
min-snapshots-to-keepMinimum snapshots to retain when expiring.
max-snapshot-age-msMaximum age of retained snapshots.
max-ref-age-msMaximum age of branch/tag refs that are not the main branch.
gc.enabledGate for orphan-file cleanup.

Delete mode

KeyDescription
write.delete.modeDelete strategy used by row-level delete writers.

Object-store data layout

KeyDescription
write.data.pathOverride data file directory.
write.metadata.pathOverride metadata file directory.
write.object-storage.enabledSwitch the location provider to the hashed object-storage layout.
write.object-storage.partitioned-pathsWhether partition values are included in object-storage paths.

Parquet writer

All defined in table/internal/parquet_files.go and read by the Parquet writer:

KeyDefault
write.parquet.row-group-size-bytes128 MB
write.parquet.row-group-limit1,048,576 rows
write.parquet.page-size-bytes1 MB
write.parquet.page-row-limit20,000 rows
write.parquet.dict-size-bytes2 MB
write.parquet.page-version2
write.parquet.compression-codeczstd
write.parquet.compression-level-1 (codec default)
write.parquet.bloom-filter-max-bytes1 MB
write.parquet.bloom-filter-enabled.column.<name>(per-column toggle, prefix-matched)

Parquet reader

KeyDescription
read.parquet.batch-sizeArrow record-batch size used by the Parquet reader.