DuckLake — SQL metadata for the lakehouse · Methodologies

Methodology

DuckLake — SQL metadata for the lakehouse

Methodology for managing lakehouse metadata in a transactional SQL database rather than as JSON/Avro manifest files in object storage. Three-layer separation: data lives in Parquet on blob, metadata lives in any SQL catalog, compute reads both independently. Despite the name, DuckLake is not tied to DuckDB — it's a catalog format, not an engine choice.

926×

Faster queries vs. Iceberg on DuckDB Labs benchmarks; 105× faster ingestion; in streaming workloads, 900× faster reads and 100× faster writes. Independent validation outside DuckDB Labs benchmarks remains pending — but the architectural claim, that SQL metadata avoids manifest-file proliferation, holds regardless of the specific multiplier.

The pipeline

Storage

Parquet on blob

S3, Azure Blob, GCS, MinIO. Same files Iceberg uses.
→
Catalog

SQL metadata database

Postgres, SQLite, DuckDB, MotherDuck — any SQL DB with PKs and transactions.
→
Compute

Engine-agnostic

Any engine that reads the spec; reference implementation is the DuckDB extension.
→
Serve

Lakehouse queries

Same Parquet files queryable from multiple compute nodes concurrently.

What composes, what’s brittle

Production-ready. v1.0 released April 2026 with stable spec and backward-compatibility guarantee.
Replaces the full stack. Not just Iceberg or Delta — replaces "Iceberg + Polaris" or "Delta + Unity" together.
Why SQL metadata wins. High-frequency commits don't thrash a Postgres index the way they thrash Iceberg manifest lists.
Native encryption. Data files encryptable; keys in the catalog DB. Auth and AuthZ via the catalog.
Compatibility. Data files exportable to Iceberg if needed; not a one-way bet.
What's not validated yet. Independent benchmarks outside DuckDB Labs; security workload validation at TB/day scale.

Sources: ducklake.select v1.0 spec (April 13, 2026); MotherDuck announcement; DuckDB Labs technical blogs; InfoQ coverage; The Register (April 16, 2026).

DuckLake — SQL metadata for the lakehouse

See how the pattern lands on your workload.