Methodology
DuckLake — SQL metadata for the lakehouse
Methodology for managing lakehouse metadata in a transactional SQL database rather than as JSON/Avro manifest files in object storage. Three-layer separation: data lives in Parquet on blob, metadata lives in any SQL catalog, compute reads both independently. Despite the name, DuckLake is not tied to DuckDB — it's a catalog format, not an engine choice.
Faster queries vs. Iceberg on DuckDB Labs benchmarks; 105× faster ingestion; in streaming workloads, 900× faster reads and 100× faster writes. Independent validation outside DuckDB Labs benchmarks remains pending — but the architectural claim, that SQL metadata avoids manifest-file proliferation, holds regardless of the specific multiplier.
The pipeline
-
Storage
Parquet on blob
S3, Azure Blob, GCS, MinIO. Same files Iceberg uses.
-
Catalog
SQL metadata database
Postgres, SQLite, DuckDB, MotherDuck — any SQL DB with PKs and transactions.
-
Compute
Engine-agnostic
Any engine that reads the spec; reference implementation is the DuckDB extension.
-
Serve
Lakehouse queries
Same Parquet files queryable from multiple compute nodes concurrently.
What composes, what’s brittle
- Production-ready. v1.0 released April 2026 with stable spec and backward-compatibility guarantee.
- Replaces the full stack. Not just Iceberg or Delta — replaces "Iceberg + Polaris" or "Delta + Unity" together.
- Why SQL metadata wins. High-frequency commits don't thrash a Postgres index the way they thrash Iceberg manifest lists.
- Native encryption. Data files encryptable; keys in the catalog DB. Auth and AuthZ via the catalog.
- Compatibility. Data files exportable to Iceberg if needed; not a one-way bet.
- What's not validated yet. Independent benchmarks outside DuckDB Labs; security workload validation at TB/day scale.
Sources: ducklake.select v1.0 spec (April 13, 2026); MotherDuck announcement; DuckDB Labs technical blogs; InfoQ coverage; The Register (April 16, 2026).