Security Data Works

Writing · Lakehouse formats

The format war is decided by your write pattern.

Most of the Iceberg-versus-DuckLake argument I see is conducted on read speed, which is the wrong axis, because when I put the same billion rows of byte-identical Parquet under both formats and queried them, they came back at parity on three of four queries and the fourth was inside the noise I'd expect from a single machine. The two formats barely differ on how data comes out. Where they diverge, sharply and reproducibly, is on how data goes in, so the choice between them is governed by the write contract you're signing up for rather than by which one reads faster, and for security telemetry the write contract is usually the part that hurts.

The axis everyone argues on

Reads don't break the tie.

I went into this expecting the read benchmark to settle it, because that's the comparison the vendor decks lead with and it's the one a migration plan worries about first. So I held the data constant in the most literal way I could, which is to register the same physical Parquet files into both formats rather than letting each one write its own copy, because the moment you let two systems each encode the data you stop comparing the formats and start comparing their encoders, and the encoder difference is large enough to swamp everything else. With the bytes pinned the same on disk, at a billion rows, the two formats finished three of the four read queries within touching distance of each other, and the one query that showed daylight wasn't far enough outside the run-to-run variation on this machine for me to call it a real format effect rather than scheduling noise.

That result is dull in the best way, and it's worth saying plainly: if you've already landed your data and you're asking which format serves a large forensic scan faster, the honest answer from this run is that it mostly doesn't matter, because the engine and the file layout matter far more than the table format wrapped around them. I've written separately about why the encoder is the real lever on read performance (the encoder is the read lever), and if the thing that moves read time is who wrote the Parquet rather than which manifest format points at it, then read time can't be what decides between Iceberg and DuckLake.

Which left me with a question I hadn't planned to make the centre of the work. If reads are a wash, what actually separates these two formats in a way that should change an architecture decision? The answer turned out to live entirely on the write side, and it splits into two findings that point the same direction.

Finding one · planning under accumulation

Iceberg's planning grew 6.5x as files piled up.

The first thing I measured was query planning as files accumulate, which is the part of a query that happens before any data is read, while the engine figures out which files it has to touch. I held the engine constant by having DuckDB read both formats, so the only variable was how each format answers the question "what files make up this table right now," and I walked the table up a commit ladder from 10 files to 200 files at 5 million rows, the way a streaming ingest would steadily add files over a shift. Over that ladder, Iceberg's end-to-end planning time grew by 6.5x, and the specific step that walks the manifests to enumerate the data files, the plan_files step, grew by 17.6x as the file count climbed. DuckLake's catalog resolution over the same ladder stayed flat at roughly 3 milliseconds, with no meaningful trend as files accumulated.

The reason for the split is structural rather than incidental, which is the part I'd want an architect to internalise, because it tells you the gap won't tune away. Iceberg keeps its file list in a chain of manifest files that live in object storage alongside the data, so answering "which files are in this table" means walking that chain, and the more commits you've made the longer it is and the more small metadata objects the planner has to fetch and parse before it can start reading rows. DuckLake keeps the same information in a SQL catalog, so the equivalent question is an indexed lookup that doesn't care much whether the table is made of 10 files or 200. This is the small-files tax that anyone who has run a streaming pipeline into Iceberg has felt, and what the measurement clarifies is that it's a property of where the metadata lives, not a tuning knob you forgot to turn, so you pay it on every plan until you run compaction to collapse the files back down.

I'll flag the obvious limit: 200 files is a small ladder, and a real Iceberg table runs maintenance to keep the manifest chain short, so nobody runs a production table at the unbounded end of this curve on purpose, and Iceberg planning isn't slow in steady state once you compact. What matters is the shape of the curve between compactions, because that shape is exactly what a high-frequency writer rides, and the more often you commit the more time you spend climbing it before maintenance pulls you back down.

Finding two · streaming cadence

At five rows per commit, inlining won by 3.93x.

The second finding is what happens at write time when the commits are tiny and frequent, which is the regime streaming security ingest actually lives in, because events arrive continuously and a pipeline that wants low end-to-end latency commits small and often rather than batching up large. I ran a write throughput test across a range of commit sizes, and with DuckLake's data inlining turned on it beat Iceberg's one-file-per-commit write path by 3.93x at 5 rows per commit, where Iceberg's commit p95 latency sat at 133 milliseconds against DuckLake's 12. As the batch grew the advantage narrowed to 2.13x at 500 rows per commit, which is the trend you'd expect, because the larger each commit gets the more the fixed per-commit overhead amortises across the rows it carries.

What stood out in the numbers is that Iceberg's commit p95 stayed at roughly 133 milliseconds regardless of batch size, because each commit writes a data file, writes a manifest, and writes new table metadata whether that commit carries 5 rows or 500, so the floor under a commit is the cost of producing those three artifacts and that floor doesn't move when you change the row count. DuckLake's inlining sidesteps the floor by writing zero data files at small cadence, folding the rows directly into the catalog instead of materialising a tiny Parquet file per commit, and the rows stay readable throughout because the catalog serves them coherently rather than the reader having to find a file that may not have been written yet. The format flushes the inlined rows out to real Parquet later, on its own schedule, so you get the small-commit latency of a database with the eventual file layout of a lakehouse, which is a genuinely clever piece of design and the clearest place the two formats stop being interchangeable.

I want to be careful about what these ratios are, because the spread between 3.93x and 2.13x is the actual finding and the absolute milliseconds are not portable. A different machine, real object storage with network latency instead of a local disk, a different catalog backend, all of those move the absolute numbers, and they'd likely move them in Iceberg's disfavour rather than its favour, since the per-commit manifest writes I measured against a local filesystem get more expensive when each one is a round trip to S3. So I'd trust the direction (file-per-commit carries a fixed overhead that punishes small commits, inlining removes it) well ahead of any specific 12-versus-133 figure.

That prediction is now a measurement, because I reran the streaming test on the MOAR reference stack against a real MinIO object store rather than a local disk, taking 100,000 rows in once as a single commit and then again as a hundred small commits and watching what the hundred commits cost each format. On Iceberg the streaming cadence pushed ingest from 0.44 seconds to 16.3 seconds, roughly 37x, and the reason is visible in the file counts: the hundred commits left a hundred data files behind, one per commit as the design requires, but they also left 301 metadata files where the single commit left four, because each commit writes its own metadata.json alongside a fresh manifest and manifest-list, so the metadata footprint went from 8.9 KB to about 4.6 MB and query planning, which has to walk those manifests before it can read a row, climbed from 8.7 to 181 milliseconds. DuckLake ran the same hundred commits and kept its metadata in the catalog database, so the metadata.json and manifest proliferation simply never happened and planning stayed flat at about 7 milliseconds; with inlining off it still wrote the hundred data files but ingested in 2.9 seconds, about 5.6x faster than Iceberg's stream because there were no per-commit metadata and version-hint round trips to the object store, and with inlining on the small commits wrote zero Parquet files at all and folded the rows into the catalog. The object store moved the numbers the way I said it would, making Iceberg's ingest tax larger than the local-disk run rather than smaller, and the shape is the finding I'd carry, not the exact milliseconds: this is still a single host, so what travels is Iceberg's per-commit floor and lengthening manifest walk set against DuckLake's catalog-flat planning, which the round trips to MinIO sharpen rather than invent.

What the two findings add up to

Map the format to the write contract.

Put the two findings next to the read result and the architectural reading falls out cleanly. The place Iceberg's design costs you the most is precisely the streaming, tiny-commit ingest pattern, because that's where the fixed per-commit overhead is paid most often and where the small-files tax on planning accumulates fastest between compactions, and it happens to be the exact write pattern that security telemetry generates, since logs and alerts and network events arrive continuously rather than in tidy nightly batches. That's the regime where DuckLake's catalog-plus-inlining model wins, and wins for a reason you can point at rather than a benchmark artifact. On the other side, bulk and batch loads and large forensic scans are where the gap closes, because a bulk load makes few large commits so the per-commit floor barely registers, and a forensic scan is a read where the two formats already test at parity, and that's the regime where Iceberg's far more mature ecosystem, broad engine support, and self-describing file metadata earn their place.

So the design move I'd make isn't to pick one format and use it everywhere, it's to match the format to the write contract of each tier. A streaming hot tier, where events land continuously and you want them queryable within seconds, is a good fit for DuckLake's model, and a bulk or cold tier, where you compact, archive, and run long-horizon investigations over months of history, is where I'd reach for Iceberg and its ecosystem. That's a tiering decision driven by how data enters each tier, which is why I keep saying the write pattern is the architectural variable here, not the read pattern, because the read pattern doesn't discriminate between the formats and the write pattern discriminates hard.

It's worth being explicit that Iceberg is not standing still on exactly this gap. The Iceberg V4 work has proposals aimed squarely at the small-files-and-streaming problem, a Root-Manifest design that would shorten the metadata walk and a Parquet-metadata approach meant to cut the per-commit and per-plan overhead, and if those ship and deliver they'd narrow or close the part of this finding that favours DuckLake. But as of the middle of 2026 they haven't shipped, the V4 milestone is still empty, and I try hard not to make architecture decisions on a roadmap, so the honest current state is that the gap is real today and there's a credible plan to address it that hasn't landed yet. I'm tracking the milestone, and I'll revise this reading when there's a release to measure rather than a proposal to read.

What this is and isn't evidence of

One machine, the direction transfers, the times don't.

I'd rather state the limits than have a careful reader find them, because they bound what you should take from this and the bounded version is still useful. This is a Tier B result, first-party and reproduced but run on a single machine against local storage, so the absolute timings are a property of that setup as much as of the formats, and I wouldn't quote the 133-millisecond Iceberg commit or the 3-millisecond DuckLake plan as numbers you'll see on your own infrastructure. What I'd carry across is the ratios and, more than the ratios, the direction, because the mechanisms underneath them (a fixed per-commit cost in a file-per-commit design, a metadata walk that lengthens with file count, an inlining path that removes both at small cadence) are structural facts about how each format is built, and structural facts travel better than stopwatch readings.

There's a specific overclaim I want to head off, because it's the one the vendor framing invites. The streaming numbers quoted for DuckLake-style inlining run as high as 100x to 900x against a naive file-per-commit baseline, and I went looking for that range and didn't find it, because in a controlled run where I held the engine and the data constant the advantage showed up as a modest 2x to 4x. I don't think the big numbers are fabricated so much as produced under conditions that maximise the baseline's pain, like remote object storage and pathologically tiny commits with no batching at all, and a fair comparison on one machine compresses them toward the low single digits. So the effect is real and directional and I'd design around it, but it's a 2x-to-4x effect in the conditions I could measure honestly, and I'd be suspicious of anyone selling you the 900x without telling you what baseline it's measured against.

The piece of this I'm most confident in is the read parity, because that's the result that disciplines the rest. If reads came back at parity on byte-identical data, then any honest account of why you'd pick one format over the other has to live on the write side, and that constraint is what turns a pile of timings into an architecture argument rather than a benchmark leaderboard.

Turning the finding into a decision

Decide on how data enters, then measure your own.

If you're standing in front of this choice for a security-data platform, the practical version of the argument is to start from your ingest cadence rather than from a read benchmark, because the read benchmark won't separate the candidates and the ingest cadence will. Look at how each data source actually commits: a Zeek or Suricata stream feeding a hot tier commits small and often and is exactly where the file-per-commit floor and the planning tax bite, while a nightly EDR export or a batch backfill into a cold archive commits large and rarely and lands in the regime where the formats converge and Iceberg's ecosystem advantage is what I'd weight. The decision is a tiering decision keyed to the write contract of each source, and once you've framed it that way the formats sort themselves, with the streaming hot tier leaning DuckLake and the bulk cold tier leaning Iceberg, until V4 ships and gives me a reason to remeasure.

The one thing I wouldn't do is take my ratios, or anyone's, as a substitute for measuring your own write pattern on your own infrastructure, because the absolute numbers move with storage and catalog and machine in ways that can change the size of the gap even when they don't change its sign. The reason I trust the direction here is that I ran it under conditions I controlled and reported and that you could rerun, which is a different thing from a vendor slide, and I've written up the methodology that keeps a run like this from quietly lying to you (how to run a benchmark that doesn't lie): hold the data constant, hold the engine constant, vary one thing, report the conditions, and let someone else rerun it.

The broader pattern I keep landing on is that the lakehouse-format arguments which get the most airtime are conducted on the axis that discriminates least, because read speed is what's easy to put on a chart and what the decks lead with, while the axis that actually decides the thing for a streaming security workload is how data enters the table, which is harder to chart and rarely the headline. If you're going to spend one afternoon measuring before you commit to a format, spend it on writes.

Evidence: Tier B (first-party, reproduced; single machine, local storage). Findings from the SDW Lab Iceberg-vs-DuckLake run. Planning: engine held constant (DuckDB reading both), commit ladder 10→200 files at 5M rows, Iceberg end-to-end planning +6.5x and plan_files +17.6x while DuckLake SQL-catalog resolution stayed flat at ~3 ms. Streaming: DuckLake inlining beat Iceberg file-per-commit 3.93x at 5 rows/commit (commit p95 12 ms vs 133 ms), narrowing to 2.13x at 500 rows/commit, Iceberg commit p95 ~133 ms regardless of batch. Reads: at 1B rows on byte-identical Parquet the two formats came back at parity on 3 of 4 queries. Vendor "100-900x" streaming claims reproduce here only as a 2x-4x effect in a controlled run. Iceberg V4 Root-Manifest and Parquet-metadata proposals target this gap but have not shipped (milestone empty as of mid-2026). Related: the encoder is the read lever and the benchmark methodology.

Pick the format your ingest cadence is asking for.

On byte-identical data the formats read at parity, so reads won't decide it. The streaming hot tier and the bulk cold tier want different things from how data enters the table, and that's the comparison worth running on your own storage before you commit.