Security Data Works

Methodology

DuckLake — SQL metadata for the lakehouse

Methodology for managing lakehouse metadata in a transactional SQL database rather than as JSON/Avro manifest files in object storage. Three-layer separation: data lives in Parquet on blob, metadata lives in any SQL catalog, compute reads both independently. Despite the name, DuckLake is not tied to DuckDB — it's a catalog format, not an engine choice.

926×

Faster queries vs. Iceberg on DuckDB Labs benchmarks; 105× faster ingestion; in streaming workloads, 900× faster reads and 100× faster writes. Independent validation outside DuckDB Labs benchmarks remains pending — but the architectural claim, that SQL metadata avoids manifest-file proliferation, holds regardless of the specific multiplier.

The pipeline

  1. Storage

    Parquet on blob

    S3, Azure Blob, GCS, MinIO. Same files Iceberg uses.

  2. Catalog

    SQL metadata database

    Postgres, SQLite, DuckDB, MotherDuck — any SQL DB with PKs and transactions.

  3. Compute

    Engine-agnostic

    Any engine that reads the spec; reference implementation is the DuckDB extension.

  4. Serve

    Lakehouse queries

    Same Parquet files queryable from multiple compute nodes concurrently.

What composes, what’s brittle

  • Production-ready. v1.0 released April 2026 with stable spec and backward-compatibility guarantee.
  • Replaces the full stack. Not just Iceberg or Delta — replaces "Iceberg + Polaris" or "Delta + Unity" together.
  • Why SQL metadata wins. High-frequency commits don't thrash a Postgres index the way they thrash Iceberg manifest lists.
  • Native encryption. Data files encryptable; keys in the catalog DB. Auth and AuthZ via the catalog.
  • Compatibility. Data files exportable to Iceberg if needed; not a one-way bet.
  • What's not validated yet. Independent benchmarks outside DuckDB Labs; security workload validation at TB/day scale.

Sources: ducklake.select v1.0 spec (April 13, 2026); MotherDuck announcement; DuckDB Labs technical blogs; InfoQ coverage; The Register (April 16, 2026).

See how the pattern lands on your workload.

The matrix scoring that justified each reference architecture's tool choices is the paid deliverable. The benchmark behind it is public — reproduce it on your own workload, then book a call to scope the work.