Security Data Works

H1-PLATFORM-01 · Tier A · 4.5/5 (extension)

Delta, Iceberg, and what's actually open.

In June 2024 Databricks paid $1B+ to acquire Tabular — the company founded by the creators of Apache Iceberg. The vendor that built and championed Delta Lake now owns the team behind Delta's open-source competitor. The practitioner reaction was loud (Ryan Blue's announcement post pulled 1,050 reactions, the highest engagement across the year's data engineering posts). The strategic read is more interesting than the noise.

The strategic read

Convergence, not displacement.

Three frames went through the practitioner discourse in the days after the acquisition. Databricks admits Iceberg won. Databricks is buying the competition to bury it. Talent acquisition with no format implications. All three miss what the public evidence actually shows. The Apache Iceberg v3 roadmap — published, governed by the Iceberg PMC, not by Databricks unilaterally — explicitly incorporates features Delta pioneered: deletion vectors for efficient deletes without rewriting whole files, the variant type for native semi-structured data handling, row IDs for efficient row-level updates. The Iceberg community isn't claiming superiority over Delta; it's borrowing Delta's innovations.

Going the other way: Delta UniForm, a production Databricks feature, lets Delta tables be read as Iceberg tables automatically. That isn't a migration path away from Delta — it's interoperability. You write Delta, you query with Iceberg-compatible engines (Trino, Dremio, Spark, Flink). The two formats are converging on a common surface area that consumers can hit through either API.

Public statements from Ryan Blue and Ali Ghodsi (Databricks CEO) since the acquisition have explicitly framed the work as moving toward "a single, open, common standard" over a multi-year horizon. The operational pattern of the acquisition, not the rhetoric around it, is the signal: a vendor with strong incentives to maintain Delta dominance buying the team most likely to make Iceberg credible at scale, then contributing Delta's strongest features back to the Iceberg roadmap. That's the shape of convergence.

The framing that actually matters

The lock-in lives at the catalog. Not at the table format.

The useful distinction the data engineering community is converging on: open-source — can you see the code — is not the same property as open-standards — can you swap the parts. Databricks open-sourced Delta Lake under Apache 2.0; that's open-source. Whether you can run a non-Databricks stack on top of Delta with full feature parity is open-standards. The two don't move together.

A useful four-layer decomposition for evaluating openness across a lakehouse stack:

Most vendors deliver layers one and two as open standards and lock you in at layers three and four. The table format is roughly 20% of the lock-in surface; the catalog and authorization layers are roughly 80%. Databricks' Unity Catalog is where Databricks lock-in actually lives, not Delta Lake. AWS Lake Formation is where AWS Security Lake lock-in actually lives, not Parquet. Snowflake's RBAC is where Snowflake lock-in actually lives, not their support for Iceberg.

The Databricks-Tabular acquisition makes this distinction sharper. Databricks now has structural influence over both major "open" table formats while continuing to compete on the proprietary catalog and authorization layers above them. The smart strategy and the lock-in concern are the same fact viewed from different angles.

Adjacent pattern

The "open" vendor your team likes is an acquisition target.

Databricks-Tabular is not isolated. The same year saw a sequence of acquisitions in the security data pipeline space that quietly compressed the field of vendor-neutral options:

  • CrowdStrike acquired Onum for approximately $290M — a security data pipeline company previously positioned as vendor-neutral.
  • SentinelOne acquired Observo AI for approximately $225M — observability and security data pipeline tooling.
  • Palo Alto Networks acquired Chronosphere for $3.3B — observability pipeline at scale, previously a Datadog alternative.

The shape: the same SIEM and EDR vendors customers were trying to escape are now buying the platforms they were escaping to. The "vendor-neutral" pipeline you stand up in 2025 may have a new owner inside 18 months. The implication for procurement is structural — evaluate not just current vendor positioning, but acquisition risk, and design architectures around components that can be swapped rather than entire vendor stacks.

What this means in procurement

Choose by ecosystem fit. Evaluate lock-in at every layer.

The 2023-vintage advice — "pick Iceberg vs. Delta carefully because the standards battle will determine your future flexibility" — has aged poorly. The convergence trajectory means most table-format choices made in 2025–2026 will see meaningfully reduced long-term consequence by 2027–2028. The decision pivot moves elsewhere.

Already on Databricks? Delta Lake plus UniForm gives you Iceberg compatibility for multi-engine read access without leaving the Databricks operational surface. Need multi-cloud read access from day one? Iceberg, with a non-Databricks catalog (Polaris, Nessie, or AWS Glue depending on environment). Greenfield deployment with no platform commitment? Iceberg as primary API, given the convergence trajectory and the broader engine support.

The harder questions to ask vendors aren't about the table format. They're about the catalog and the authorization layer. Can I export my access control policies if I migrate? What's the migration path from Unity Catalog or Lake Formation to a different catalog — manual recreation, automated tooling, partial? Are identity and authorization using open standards (OIDC, OpenFGA) or proprietary APIs that don't survive the move? A vendor whose answer to those questions is hand-waving is telling you exactly where the lock-in actually lives.

And the procurement-side test for vendor consolidation risk: can your architecture survive any single vendor in the stack getting acquired by an incumbent? The vendor-neutral posture isn't a property of any one vendor; it's a property of how replaceable each component is. Design for component replaceability, not for vendor allegiance.

What this extends

H1-PLATFORM-01, with the multi-layer openness lens.

The anchor hypothesis on the research page reads: Iceberg, a neutral catalog (Polaris, Nessie, or equivalent), and a swappable query engine (Dremio, Trino, ClickHouse, DuckDB depending on workload) form the platform baseline that survives vendor consolidation. The Databricks-Tabular acquisition doesn't contradict that baseline — Iceberg's standardization momentum and v3 roadmap actually strengthen it. What it does is raise the visibility of where the platform's openness actually buys flexibility, and where it doesn't.

The framing on the research-page anchor stays at 4.5/5 confidence. The catalog-layer companion claim — that the catalog choice is the load-bearing lock-in surface, not the table format — is part of the same hypothesis but is treated as the sharper version of it as of mid-2026. Iceberg is necessary; a non-vendor-locked catalog plus an open authorization layer is what makes Iceberg's openness operational.

What would change the answer. Databricks signaling that Iceberg performance on the Databricks Runtime is meaningfully inferior to Delta on Databricks — current public claims and benchmarks suggest parity, but production validation is sparse. An open authorization standard (OpenFGA or equivalent) reaching meaningful catalog adoption — currently the weakest link in the multi-layer-openness argument. A vendor consolidation move that meaningfully reduces the catalog options (Snowflake acquiring a Polaris-adjacent player; Databricks merging Unity Catalog functionality into the Iceberg REST Spec layer). Each is a structural shift; absent them, the baseline holds.

Open is a property of the whole stack, not any single layer.

The full anchor hypothesis on the platform baseline, plus the contradiction on Iceberg and Delta convergence, are on the research page. The matrix offering applies the multi-layer openness lens to specific platform decisions.