H1-PLATFORM-01 · Tier B · 4.5/5 (extension)

Delta, Iceberg, and what's actually open.

In June 2024 Databricks paid $1B+ to acquire Tabular, the company founded by the creators of Apache Iceberg, so the vendor that built and championed Delta Lake now owns the team behind Delta's open-source competitor. The practitioner reaction was loud (Ryan Blue's announcement post pulled 1,050 reactions, the highest engagement across the year's data engineering posts), but the strategic read underneath that noise is the part worth working through.

At a glance

Openness, layer by layer.

The "Iceberg vs Delta" framing collapses four distinct layers into one, and splitting them apart shows where convergence has already happened, where it's underway, and where the lock-in actually lives. Every row is expanded in the prose below.

Layer	What's open today	What's gated (Databricks-bound)	Where convergence is moving
Table format	Apache Iceberg (ASF). Delta Lake source under Apache 2.0.	Delta's reference implementation tracks Databricks Runtime; full feature parity outside Databricks lags.	Per Ryan Blue and Ali Ghodsi, "a single, open, common standard" over a multi-year horizon. Iceberg V3 absorbs Delta's deletion vectors, variant type, and row IDs.
Format interoperability	Iceberg V3 spec is community-governed by the Iceberg PMC. The durable artifact is an open contract (REST Catalog + manifest spec + Parquet), not a single physical layout. That contract is now realized along a spectrum: static metadata files (classic Iceberg), metadata-in-a-SQL-database (DuckLake), and virtual or ephemeral metadata generated on demand over a live source (Streambased ISK over Kafka).	Delta UniForm (write Delta, read as Iceberg) is a Databricks-operated surface; interop runs through their stack.	Bidirectional reads. The format choice matters less by 2027–2028; the consumer hits a common interface through either API. "Choosing Iceberg" increasingly means choosing the interface, decoupled from where the bytes physically live.
Catalog interface	Iceberg REST Spec. Implementations: Polaris (Snowflake-led, open-source), Nessie, Hive Metastore variants.	Unity Catalog (Databricks), AWS Glue, Snowflake's proprietary catalog surfaces. Vendor-specific APIs, no portable export path.	REST Spec adoption is growing, but the proprietary catalogs still dominate production. This is the principal lock-in layer, not the table format.
Query engine	Iceberg reads natively from Spark, Trino, Dremio, Flink, Athena, Snowflake, DuckDB, ClickHouse, StarRocks.	Best Delta performance is inside Databricks Runtime; external engines work through UniForm, which is Databricks-operated.	Engine support for Iceberg V3 features (deletion vectors, row lineage, variant) rolling out across the ecosystem through 2026.
Identity	OIDC, widely supported across catalogs and engines.	Few real gates here. The open standard already won at this layer.	Stable. Identity is the layer that doesn't move.
Authorization	OpenFGA, the most credible open candidate, limited adoption so far. In security specifically, the catalog is not the moat: per-event row-level ABAC, provenance, and audit-retention requirements push lock-in down to the engine and pipeline layer, not the catalog.	Unity Catalog row-and-column controls, Lake Formation policies, Snowflake RBAC. All proprietary, none portable.	The weakest link in the multi-layer-openness argument. No clear convergence yet. The largest counter-evidence is format-level: Iceberg V3 row lineage (shipped Iceberg v1.9.0, April 2025; `_row_id` plus `_sequence_number`) is a catalog-agnostic audit primitive that closes part of the chain-of-custody gap regardless of which catalog sits above it. Spec coverage is not auditor-accepted production maturity yet — Snowflake and Databricks exposed it in preview only in early 2026.
Ownership of the table-format roadmap	Iceberg roadmap published and governed by the Iceberg PMC. Delta source under Apache 2.0.	Databricks now owns the team behind Iceberg (Tabular acquisition, June 2024, $1B+), giving it structural influence over both major "open" formats.	Same fact, two readings: smart convergence strategy, or asymmetric influence over the standard. Both are true.
Lock-in share of the stack	Layers 1–2 (table format, format interop) deliver as open standards in most stacks.	Layers 3–4 (catalog, authorization) are where ~80% of the lock-in actually lives.	Procurement test: can your architecture survive any single vendor in the stack getting acquired by an incumbent?

The strategic read

Convergence, not displacement.

Three frames went through the practitioner discourse in the days after the acquisition, with one reading it as Databricks admitting Iceberg won, another as Databricks buying the competition to bury it, and a third as talent acquisition with no format implications, and all three miss what the public evidence actually shows. The Apache Iceberg v3 roadmap (published, governed by the Iceberg PMC, not by Databricks unilaterally) explicitly incorporates features Delta pioneered: deletion vectors for efficient deletes without rewriting whole files, the variant type for native semi-structured data handling, row IDs for efficient row-level updates. So the Iceberg community isn't claiming superiority over Delta so much as borrowing Delta's innovations.

Going the other way, Delta UniForm, a production Databricks feature, lets Delta tables be read as Iceberg tables automatically, which is interoperability rather than a migration path away from Delta, because you write Delta and then query with Iceberg-compatible engines (Trino, Dremio, Spark, Flink). So the two formats are converging on a common interface that consumers can hit through either API.

Public statements from Ryan Blue and Ali Ghodsi (Databricks CEO) since the acquisition have explicitly framed the work as moving toward "a single, open, common standard" over a multi-year horizon, though the signal I'd weight is the operational pattern of the acquisition rather than the rhetoric around it. A vendor with strong incentives to maintain Delta dominance is buying the team most likely to make Iceberg credible at scale, then contributing Delta's strongest features back to the Iceberg roadmap, which is what convergence looks like in practice.

There's a deeper move under the format question, and it changes what "choosing Iceberg" even means. The durable thing Iceberg standardizes is an open contract (REST Catalog plus the manifest spec plus Parquet) rather than a fixed physical layout on disk, and that contract is now being realized along a spectrum. The classic form keeps metadata as static files in object storage, while DuckLake moves it into a SQL database, and Streambased's ISK synthesizes it virtually over a live Kafka topic with no persisted metadata at all. The query engine consumes the protocol and is largely indifferent to which backend produced the bytes, so the practical decision is shifting from "which table format" to "which interface," decoupled from where the data physically lives.

The open question, and the bet I'm actually making, is whether that contract coheres as the backends multiply, or whether it fragments into incompatible dialects that quietly break the "Iceberg-compatible" guarantee. So far the evidence runs toward convergence. The REST Catalog spec has become the 2026 lingua franca; Apache Polaris graduated to an Apache top-level project on 2026-02-18; and Polaris, the open-source Unity Catalog, Snowflake's Open Catalog, AWS Glue, and BigQuery's managed interface all speak REST with no incompatible extensions documented against the spec. I'd flag the limits of that claim honestly. The sources are Tier C and vendor-adjacent, "no incompatibilities documented" is not the same as "none exist," and the convergence is production-proven mainly along the materialized path, not yet the virtual one. The contract holding is the thing to watch, because if it holds, the format war was never the war that mattered.

I ran a first-party version of that coherence check on 2026-06-07 against the MOAR reference stack on a single host, swapping catalogs and formats underneath one OCSF table with ./moar swap-catalog and ./moar swap-format, and the same query read through three independent Iceberg REST catalog implementations — the iceberg-rest Java reference fixture, Nessie on Java/Quarkus, and Lakekeeper, which is a Rust service backed by Postgres — returned the identical answer (rdp=125) across all three, and the same data read identically across Iceberg and DuckLake on one object store. So a compliant client gets the identical answer across three independent REST-catalog codebases, which moves the essay's "no incompatible extensions documented" read from asserted toward verified on three implementations along the materialized path. The honesty caveats stay though, because this is an answer-equality check on a single host across the open-catalog set, and I didn't run Unity, Glue, Snowflake, or BigQuery, so the broader convergence claim stays documented rather than first-party.

The framing that actually matters

The lock-in is at the catalog. Not at the table format.

The useful distinction the data engineering community is converging on is that open-source (can you see the code) is a different property from open-standards (can you swap the parts). Databricks open-sourced Delta Lake under Apache 2.0, so that's open-source, but whether you can run a non-Databricks stack on top of Delta with full feature parity is the open-standards question, and the two properties don't move together.

Most of the vendors I've evaluated deliver layers one and two as open standards and then lock you in at layers three and four, so the table format is roughly 20% of the lock-in while the catalog and authorization layers are roughly 80%. Databricks' Unity Catalog is where Databricks lock-in actually concentrates rather than Delta Lake, AWS Lake Formation is where AWS Security Lake lock-in concentrates rather than Parquet, and Snowflake's RBAC is where Snowflake lock-in concentrates rather than their support for Iceberg.

The Databricks-Tabular acquisition makes this distinction sharper. Databricks now has structural influence over both major "open" table formats while continuing to compete on the proprietary catalog and authorization layers above them. The smart strategy and the lock-in concern are the same fact viewed from different angles.

Adjacent pattern

The "open" vendor your team likes is an acquisition target.

Databricks-Tabular is not isolated. The same year saw a sequence of acquisitions in the security data pipeline space that compressed the field of vendor-neutral options:

Acquisition	Price	Prior positioning
CrowdStrike acquired Onum	approximately $290M	a security data pipeline company previously positioned as vendor-neutral
SentinelOne acquired Observo AI	approximately $225M	observability and security data pipeline tooling
Palo Alto Networks acquired Chronosphere	$3.3B	observability pipeline at scale, previously a Datadog alternative

The pattern that emerges is that the same SIEM and EDR vendors customers were trying to escape are now buying the platforms they were escaping to, so the "vendor-neutral" pipeline you stand up in 2025 may have a new owner inside 18 months. That makes the implication for procurement structural, because you have to evaluate current vendor positioning alongside acquisition risk, and design architectures around components that can be swapped rather than entire vendor stacks.

What this means in procurement

Choose by ecosystem fit. Evaluate lock-in at every layer.

The 2023-vintage advice ("pick Iceberg vs. Delta carefully because the standards battle will determine your future flexibility") has aged poorly, because the convergence trajectory means most table-format choices made in 2025–2026 will see substantially reduced long-term consequence by 2027–2028, so the decision pivot moves elsewhere.

Already on Databricks? Delta Lake plus UniForm gives you Iceberg compatibility for multi-engine read access without leaving the Databricks operational surface. Need multi-cloud read access from day one? Iceberg, with a non-Databricks catalog (Polaris, Nessie, or AWS Glue depending on environment). Greenfield deployment with no platform commitment? Iceberg as primary API, given the convergence trajectory and the broader engine support.

The harder questions to ask vendors aren't about the table format so much as the catalog and the authorization layer. Can I export my access control policies if I migrate? What's the migration path from Unity Catalog or Lake Formation to a different catalog: manual recreation, automated tooling, partial? Are identity and authorization using open standards (OIDC, OpenFGA) or proprietary APIs that don't survive the move? A vendor whose answer to those questions is hand-waving is telling you exactly where the lock-in actually concentrates.

And the procurement-side test for vendor consolidation risk is whether your architecture can survive any single vendor in the stack getting acquired by an incumbent, because the vendor-neutral posture is a property of how replaceable each component is rather than a property of any one vendor, which is why you design for component replaceability instead of vendor allegiance.

What this extends

H1-PLATFORM-01, with the multi-layer openness lens.

The anchor hypothesis on the research page reads: Iceberg, a neutral catalog (Polaris, Nessie, or equivalent), and a swappable query engine (Dremio, Trino, ClickHouse, DuckDB depending on workload) form the platform baseline that survives vendor consolidation. The Databricks-Tabular acquisition doesn't contradict that baseline, and Iceberg's standardization momentum and v3 roadmap actually strengthen it, so what it does is raise the visibility of where the platform's openness actually buys flexibility, and where it doesn't.

The framing on the research-page anchor stays at 4.5/5 confidence. The catalog-layer companion claim, that the catalog choice is the principal lock-in layer rather than the table format, is part of the same hypothesis but is treated as the sharper version of it as of mid-2026. Iceberg is necessary; a non-vendor-locked catalog plus an open authorization layer is what makes Iceberg's openness operational.

What would change the answer. Databricks signaling that Iceberg performance on the Databricks Runtime is materially inferior to Delta on Databricks. Current public claims and benchmarks suggest parity, but production validation is sparse. An open authorization standard (OpenFGA or equivalent) reaching substantial catalog adoption, currently the weakest link in the multi-layer-openness argument. A vendor consolidation move that substantially reduces the catalog options (Snowflake acquiring a Polaris-adjacent player; Databricks merging Unity Catalog functionality into the Iceberg REST Spec layer). Each is a structural shift; absent them, the baseline holds.

Open is a property of the whole stack, not any single layer.

The full anchor hypothesis on the platform baseline, plus the contradiction on Iceberg and Delta convergence, are on the research page. The matrix offering applies the multi-layer openness lens to specific platform decisions.

Back to research → See the matrix