Technology deep-dive

Iceberg vs Delta Lake for security data.

Choosing a lakehouse table format is the most consequential architectural decision for security data operations in 2026. It governs query-engine portability, vendor neutrality, operational complexity, and migration cost. The decision has narrowed to Iceberg or Delta, and Iceberg V3 changed the math.

Reading time: 23 minutes. Evidence tier: A (production validation from Netflix, Insider, Adobe, InMobi plus AWS and Databricks product announcements) with one Tier D update on Databricks Lakewatch.

TL;DR

If you only read one thing

The decision has narrowed to two: Iceberg or Delta. Hudi is out for most security workloads.
Iceberg V3 changed the math. Puffin deletion vectors close the merge-on-read maturity gap; row-level lineage weakens Delta's CDC advantage; Variant type reframes the JSON and schema-evolution story.
Production evidence stacks on Iceberg. Netflix built it and runs it at petabyte scale. Insider reports 90% cost reduction. Adobe is 5,000+ tables (Delta). InMobi runs GDPR/CCPA on Delta. Both formats are real at scale.
Recommendation by environment. Multi-cloud or multi-engine: Iceberg. Databricks-native shop with no exit plan: Delta. Migration cost ($50K–500K to reprocess petabytes) is the dominant constraint, so pick once.
The vendor-neutrality argument is unaffected by V3. It's about ecosystem dynamics, not table-format mechanics, and it remains the strongest reason to prefer Iceberg.

Timeline 2023–2026: format wars (Iceberg vs Delta vs Hudi), then Databricks acquires Tabular, then Delta UniForm reads Iceberg, then format interoperability — AWS S3 Tables, Snowflake Polaris GA, BigQuery managed Iceberg — with the data layer open by default. — The format war effectively ended between 2023 and 2026 — convergence on open interoperability, not a single winner.

V3 update first

What changed in 2026.

This essay was originally drafted assuming Iceberg V2 mechanics, where the "Iceberg merge-on-read is less mature than Delta" critique and the "compliance-first means choose Delta for GDPR erasure" recommendation were both defensible. Apache Iceberg V3 changes this materially through three key features:

Puffin-based deletion vectors replace V2's position-delete and equality-delete files. This is the same mechanism Delta has used for its merge-on-read maturity advantage. The "Iceberg MoR is immature" claim was a V2 statement.
Row-level lineage (row IDs plus last-updated tracking) enables incremental processing and CDC-from-Iceberg without external metadata, weakening Delta's Change Data Feed advantage.
Variant type (semi-structured / nested JSON, ratified August 2025 in Parquet, adopted in Iceberg V3) reframes the schema-evolution conversation for security log ingestion.

Three caveats apply. Engine support for V3 features is rolling out across Spark, Trino, DuckDB, and Snowflake through 2026, so verify your engine version before assuming V3 mechanics in production. Delta Lake has a longer track record on these capabilities, which still matters for risk-averse compliance use cases. The vendor-neutrality and multi-engine arguments in this essay are unaffected by V3, because they turn on ecosystem dynamics rather than table-format mechanics, and they remain the strongest reasons to prefer Iceberg.

Specific in-line updates appear below the V2-era claims they affect. The original analysis is preserved so you can see what changed.

At a glance

Iceberg vs Delta, side by side.

The headline differences. Every row is expanded in the prose below with sources and the conditions that change the answer.

Dimension	Apache Iceberg	Delta Lake
Governance	Apache Software Foundation. Multi-vendor stewardship (Netflix, Apple, AWS, Snowflake, Databricks since 2024).	Linux Foundation umbrella; reference implementation is Databricks-controlled. Governance pluralism is improving but lags.
Catalogs supported	Polaris, Nessie, Glue, Hive Metastore, Unity (via interop). Catalog choice is yours.	Unity is canonical. Other catalogs supported but lower-priority in the ecosystem.
Query engines	Spark, Trino, Dremio, ClickHouse, DuckDB, StarRocks, Athena, Snowflake. Multi-engine is the design center.	Spark and Databricks SQL natively; external engines via UniForm (Iceberg-compatible read layer).
Merge-on-read maturity	V3: Puffin deletion vectors. Parity with Delta as of 2025–2026.	Mature; multi-year production track record.
Row-level lineage / CDC	V3: native row IDs + last-updated. CDC-from-Iceberg without external metadata.	Change Data Feed. Mature; well-tooled inside the Databricks ecosystem.
Semi-structured / JSON	V3: Variant type (ratified August 2025 in Parquet, adopted in Iceberg).	Variant type (Spark 4.0+); recent.
GDPR erasure	V3 deletion vectors close the prior gap. Mature path in production by 2026–2027.	Mature; InMobi runs GDPR/CCPA on Delta at scale.
Production validation	Netflix (built Iceberg; petabyte-scale). Apple. Adobe (also runs Delta). Insider 90% S3 cost reduction.	Adobe 5,000+ tables. InMobi GDPR/CCPA. Heavy Databricks-customer base.
Vendor neutrality	Strong. Multi-cloud and multi-vendor by construction.	Improving (UniForm + Iceberg interop announcements 2025–2026) but still strongest inside the Databricks ecosystem.
Best fit for	Multi-cloud, multi-engine, vendor-neutral security platforms. Default for greenfield in 2026.	Databricks-native shops with no exit plan. Compliance-first environments where the mature MoR track record outweighs the V3 parity.

The decision matters

The number-one architectural decision for a security lakehouse.

Choosing between Apache Iceberg and Delta Lake determines:

Query engine portability: can you swap Trino for Dremio without data migration?
Vendor neutrality: are you locked to Databricks, or multi-cloud across AWS / Azure / GCP?
Operational complexity: daily maintenance burden for petabyte-scale ingestion.
Migration cost: switching table formats means reprocessing petabytes ($50K–500K).

For security operations in 2026 this isn't really a three-way shootout, because the decision has narrowed to Iceberg or Delta, with Hudi occupying a specialized CDC-heavy niche. Hudi's complexity (merge-on-read optimization, compaction tuning) creates operational overhead without proportional benefit for most typical security workloads.

The evidence in this essay comes from Netflix (5 PB/day Iceberg), Insider (90% S3 cost reduction with Iceberg), Adobe (5,000+ Delta tables, petabyte scale), and InMobi (GDPR/CCPA compliance with Delta).

Architecture assumption

Dedicated security infrastructure.

This essay assumes dedicated security data infrastructure, which is a separation-of-duties best practice. When security data lives on isolated infrastructure, separate from corporate data platforms containing PII and financial data, many architectural decisions simplify.

Why isolation matters for table-format choice

Network isolation plus IAM as the security boundary. Dedicated security VPC/VNet, team-only access (no cross-functional multi-tenancy), simplified security posture with the network boundary plus IAM providing the primary control.

Encryption overhead becomes optional. Shared platforms must encrypt Iceberg metadata (10–20% query overhead) when PII or financial data mixes with security logs. Isolated platforms can treat metadata encryption as optional, performance-first.

RBAC complexity reduces. Shared platforms need fine-grained row-level security and column masking. Isolated platforms can use table-level permissions, avoiding the 5–30% RLS latency tax.

Compliance requirements simplify. Operational security logs (EDR telemetry, network flows, cloud audit trails) have lower compliance burden than PII. Isolation satisfies separation requirements without HIPAA/PCI-DSS-grade hardening on the security plane.

Production validation: Netflix runs dedicated observability infrastructure (5 PB/day, isolated from production systems). Huntress runs a dedicated security platform (3M endpoints, 93% cost reduction, isolated). Jake Thomas at Okta runs isolated analytics infrastructure (7.5T records, dedicated to the security team). The full sources for the Huntress and Okta numbers are on their teardown pages.

When isolation assumptions don't hold (shared corporate data platform with security data mixed with finance and operations, multi-tenant security teams like an MSSP managing 50+ customer tenants, or PII / financial data in security logs which is rare but possible for fraud-detection workloads), compliance shifts and you enable all the hardening measures, which favors Unity Catalog plus Delta Lake for built-in governance.

Production evidence

Iceberg at scale.

Netflix: petabyte-scale Iceberg validation (2018–2025)

Pre-2018, Netflix's petabyte-scale logging system ran on Apache Hive and struggled with millions of partitions causing slow metadata operations, schema evolution requiring full table rewrites, and no ACID guarantees during concurrent writes from 100+ pipelines.

Rather than accept the limitations, Netflix engineers (led by Ryan Blue) developed Apache Iceberg with a metadata layer for ACID transactions at petabyte scale, schema evolution without table rewrites, partition evolution, hidden partitioning, and time travel. Production validation: petabyte-scale tables, millions of partitions without performance degradation, instant schema evolution (seconds, not weeks). In Netflix's current logging architecture, the 5 PB/day hot-path ingestion runs through ClickHouse, with Iceberg as the historical tier.

Iceberg was open-sourced to Apache Software Foundation, then adopted by Apple (petabyte-scale observability), AWS (native support in Glue / Athena / EMR), Snowflake (Polaris catalog), and Databricks (acquired Tabular for $1B+). The security relevance is that Netflix validated lakehouse plus specialized engines at some of the highest scales in production, which is the pattern security teams should follow rather than staying on monolithic SIEM architectures.

Insider: 90% S3 cost reduction (2022)

E-commerce security operations. Baseline cost was an estimated $120K–150K/month S3 (15–20 TB/day event data × 90-day retention × 2–3× duplication for streaming + batch + compliance archives = 8,100–16,200 TB total × $0.023/GB S3 Standard).

The challenge was data duplication across streaming and batch processing pipelines, because the same event data gets written by the streaming pipeline (Kafka → S3), the batch pipeline (daily aggregation → S3), and the compliance archive (immutable logs → S3 Glacier-IR), which adds up to 2–3× storage overhead.

The fix was to migrate to an Apache Iceberg lakehouse whose unified table format eliminates that duplication, and the result was a 90% reduction in Amazon S3 costs, so $120K–150K/month became $12K–15K/month.

Cost reduction methodology: eliminating duplication drove ~70–80% of savings (streaming + batch converged into a single Iceberg table). Optimized query scans via Iceberg's intelligent file pruning added 5–10%. Automated lifecycle tiering (hot → warm → cold) added 10–15%. Schema evolution without rewrites avoided $50–100K in migration costs.

The relevance is that security data workloads carry the same duplication problem, where a SIEM plus a data lake plus a compliance archive runs to roughly 3× storage costs, and Insider's architecture is what eliminates that waste.

AWS S3 Tables: managed Iceberg (2024)

AWS announced Amazon S3 Tables in December 2024: fully managed Iceberg tables with automatic maintenance, compaction, and snapshot management. Built-in Iceberg REST Catalog, queryable from Athena, EMR, Redshift, Glue, and third-party engines. AWS providing a managed service alongside Glue support validates long-term commitment. Security teams can adopt Iceberg without operating self-managed Spark compaction jobs.

Production evidence

Delta Lake at scale.

Adobe Experience Platform: 5,000+ tables (2024)

Adobe Experience Platform (customer data platform for marketing) runs 5,000+ active Delta tables, terabytes of data ingested daily, petabytes of data managed for customers (Unified Profile offering). Z-ORDER optimization reduced processing time from hours to minutes, ACID transactions enabled concurrent writes from 100+ pipelines, and schema evolution ran without downtime for customer-facing applications, all on a stack of Databricks plus Unity Catalog plus Delta Lake. So Adobe validates Delta Lake at a scale comparable to Netflix's Iceberg, with petabyte-scale data, thousands of tables, and high-concurrency writes.

InMobi: GDPR/CCPA compliance (2023–2024)

InMobi (mobile advertising platform, security-adjacent data privacy) faced GDPR/CCPA right-to-erasure requirements that meant deleting user data across its data lake. The solution was the Databricks Lakehouse Platform with Delta Lake, where Z-ORDER indexing optimized the point deletes, time travel allowed recovery for the 30-day compliance window before permanent deletion, encryption at rest and in transit ran via FIPS 140-validated modules, and audit trails covered data access. InMobi's pattern (Delta plus Unity Catalog plus audit trails) validates compliance-heavy workflows for data-privacy regulations.

Adobe Real-Time CDP graph workload: 250B messages/day

Adobe's Real-Time CDP team migrated 2 petabytes of actively queried data from NoSQL to Delta Lake, with 5,000+ Delta tables for the multi-tenant architecture, close to 250 billion messages/day across regions, and close to 3 trillion changes/day on the Delta tables, which validates Delta Lake for extreme-scale write concurrency. Security data at scale (petabyte-scale ingestion, high write concurrency, multi-tenant isolation) mirrors this workload closely.

Scale interpretation

Scale is not the differentiator.

What "petabyte-scale" means in security terms: 1 PB/day is roughly a 100,000–500,000-employee enterprise running full-coverage logging (EDR + CloudTrail + network + application). 5 PB/day is Netflix or Apple scale (billions of events, global operations). Security-typical: 10K employees produces 1–10 TB/day; 100K employees produces 10–100 TB/day.

Both table formats handle security data scale. If your organization generates under 100 TB/day, both Iceberg and Delta Lake are proven at 50–500× your scale. Between 100 TB/day and 1 PB/day, both are validated by production case studies. Above 1 PB/day, you're in Netflix / Apple tier and should consult their architectures directly.

So the takeaway is that you should choose based on query-engine flexibility (Iceberg) versus Spark optimization (Delta Lake) rather than on performance limits, because the performance limits aren't where this decision gets made.

Architectural comparison

Five decision factors.

1. Query engine ecosystem

Iceberg: universal compatibility across Spark, Trino, Dremio, Flink, Athena, Snowflake, BigQuery, DuckDB, and Presto. Vendor-neutral. Multi-cloud (query AWS Iceberg tables from Azure Databricks via Polaris).

Delta Lake: Spark-optimized, deepest integration with Apache Spark. Expanding ecosystem (Trino, Athena, BigQuery added 2023–2024). Databricks-first, with best performance within Databricks via Unity Catalog optimization.

Decision: choose Iceberg for multi-cloud portability. Choose Delta if Databricks-committed for the deepest Unity Catalog integration.

2. Metadata architecture

Iceberg: distributed metadata in Parquet/Avro manifest files. Query engines read only needed manifests (efficient at millions of partitions). Catalog-agnostic, works with Glue, Unity Catalog, Polaris, and Hive Metastore.

Delta Lake: transaction log in _delta_log/ (JSON + Parquet checkpoints). Spark caching optimizes log reads. Checkpoint every 10 commits.

Benchmark (DataBeans, 2022, Delta 1.0 vs Iceberg 0.13.0): combined load-plus-query time was 1.68 hours for Delta versus 5.99 hours for Iceberg, ~3.5× faster overall (the load-only gap was narrower, ~1.3×). Caveat: the benchmark is Spark-centric and does not reflect Trino, Dremio, or Athena workloads where Iceberg's distributed metadata may outperform Delta's transaction log.

Decision: Delta if Spark-dominant (ETL pipelines, batch). Iceberg if mixed engines (Athena for compliance, Trino for ad-hoc, Spark for ETL).

3. Schema and partition evolution

Iceberg: partition evolution lets you change strategy without rewriting data. Start with daily partitioning, switch to hourly as volume grows. Hidden partitioning means analysts query by timestamp and Iceberg applies partition filters automatically. Add, drop, rename, reorder columns without table rewrites.

Delta Lake: partition changes require rewriting the table. Analysts must specify partition columns in WHERE clauses. Add columns without rewrites, but rename and reorder require table recreation.

-- Iceberg: add hourly partition transform (does not rewrite existing data)
ALTER TABLE security_events.firewall_logs
ADD PARTITION FIELD hours(event_time);

-- Remove daily partition (deprecate, does not delete data)
ALTER TABLE security_events.firewall_logs
DROP PARTITION FIELD days(event_time);

-- Queries automatically use optimal partitioning (hidden)
SELECT * FROM security_events.firewall_logs
WHERE event_time > now() - interval '24 hours';
-- Iceberg uses hourly for new data, daily for historical

Decision: Iceberg if data volume is unpredictable. Delta if partition strategy is stable (daily partitioning sufficient for 5+ years).

4. CDC and streaming integration

Iceberg: Flink integration for streaming writes. Kafka via copy-based approaches (Kafka → Iceberg via Flink/Spark). V2 merge-on-read used position-delete and equality-delete files, operationally heavy at high update rates. V3 introduces Puffin-based deletion vectors that materially close the MoR maturity gap with Delta. V3 row-level lineage (row IDs + last-updated tracking) enables incremental processing and CDC-from-Iceberg without external metadata, weakening the historical "Delta has Change Data Feed, Iceberg does not" advantage.

Delta Lake: Change Data Feed for native CDC support. Iceberg V3 row lineage is the equivalent capability, though Delta's CDF has more production mileage. Mature MERGE INTO for upserts and SCD Type 2. Spark Structured Streaming optimized for Delta.

Decision: Delta for CDC-heavy workloads with a need for production track record. Iceberg for write-once workloads (immutable security logs, audit trails).

5. Multi-format catalogs (Unity Catalog)

Unity Catalog (Databricks, open-sourced June 2024) supports Delta Lake, Iceberg, and Hudi in a single catalog. Managed Iceberg tables via Unity Catalog's Iceberg REST Catalog API (Public Preview, Databricks Runtime 16.4+ LTS). Foreign catalog access lets you query Iceberg tables from AWS Glue, Hive Metastores, and Snowflake via Unity Catalog, and Delta Sharing for Iceberg is in Private Preview.

Significance: Unity Catalog eliminates the "Iceberg OR Delta" binary. Security teams can use Delta for CDC-heavy tables (user inventory, asset tracking) and Iceberg for immutable logs (CloudTrail, firewall, EDR) with a single governance layer. Adopting Databricks reduces table-format lock-in.

Why Iceberg

Multi-engine, vendor-neutral, analyst-accessible.

The reference architecture I work with specifies Apache Iceberg as the lakehouse table format for four reasons.

1. Multi-engine design philosophy

Dual-engine architecture pairs StarRocks (ad-hoc queries, real-time threat hunting) with ClickHouse (scheduled queries, dashboards, compliance reporting), and both engines read Iceberg natively without vendor-specific connectors, so a future engine swap (StarRocks to Trino, ClickHouse to Druid) requires zero data migration. Delta, by contrast, requires Databricks-specific connectors for non-Spark engines, which reduces that multi-engine flexibility.

That "any engine, no migration" claim is easy to assert and rarely shown, so I ran it. On 2026-06-07, on a single host against one Iceberg/OCSF table, I pointed four engines at the same files (no copy): DuckDB, Trino, ClickHouse, and StarRocks. On the gated workloads — a full count, a needle lookup on dst_port = 3389, and a group-by on dst_port — all four returned identical answers, which is the runnable form of the portability promise: the same open table, queried four ways, and the answers agree. I'm bringing up a fifth engine, Dremio, separately, so this is a four-engine result for now. The agreement is worth checking rather than assuming, because in earlier lab work one engine (chDB) returned a filtered count tens of rows short on byte-identical Parquet and raised no error, so a fast engine can be quietly wrong and an answer that looks equal is not yet known to be equal until you run the comparison.

2. Vendor neutrality

Preserving an exit strategy matters here, because if Polaris underperforms you can swap to AWS Glue (both support Iceberg), and if Dremio pricing escalates you can swap to Trino or Athena (all of which read Iceberg), so there is no vendor lock-in at the data layer when 15+ query engines support Iceberg natively. Delta optimizes within Databricks instead, which trades portability for in-ecosystem performance.

3. Hidden partitioning for analyst accessibility

Security analysts shouldn't need data-engineering knowledge to write efficient queries.

-- Iceberg hidden partitioning
SELECT * FROM security_events WHERE event_time > now() - interval '7 days';
-- Iceberg automatically filters to optimal partitions

-- Delta Lake manual filtering
SELECT * FROM security_events
WHERE partition_date >= current_date() - interval '7' days
  AND event_time > now() - interval '7 days';

The operational benefit is that this reduces the SOC analyst training burden and prevents accidental full table scans.

4. Partition evolution for unpredictable growth

Security data volume is unpredictable, so a deployment might run 200 GB/day in Year 1 (where daily partitioning is optimal), grow to 1.5 TB/day in Year 2 (where hourly partitioning is needed), and reach 8 TB/day in Year 3 (hourly plus source partitioning). Iceberg partition evolution changes the strategy without rewriting petabytes, whereas Delta requires a full rewrite, which is a $50–200K cost for petabyte-scale tables.

When Delta wins

Three scenarios where Delta is the right call.

1. Databricks-first architecture

If you're committed to Databricks for 5+ years, with Unity Catalog governance and MLflow for threat-detection models, then Delta provides the deepest Unity Catalog integration (row-level security, column masking native), Spark performance optimizations (1.7–3.5× faster than Iceberg for Spark workloads), and Delta Sharing for secure external sharing, where the trade-off is accepting ecosystem lock-in in exchange for the strongest performance within that ecosystem.

2. CDC-heavy workloads (with caveat)

These are workloads like user behavior analytics, asset inventory tracking, and CMDB synchronization, where Delta has had Puffin-based deletion vectors longer, MERGE INTO has accumulated more production mileage, and Change Data Feed is production-validated. Iceberg V3 brings deletion vectors and row-level lineage onto parity in the spec, but adoption lags the spec, because engine support for V3 deletion vectors is rolling out across Spark, Trino, DuckDB, and Snowflake through 2026.

The "Iceberg MoR is immature" argument was a V2 statement, and for V3 deployments the gap is materially smaller, so verify your query engine version before assuming V3 mechanics in production.

3. Compliance-first security operations (also with caveat)

Think GDPR right-to-erasure, CCPA deletion, and PCI-DSS masking, where Delta's Z-ORDER indexing optimizes point deletes, Puffin-based deletion vectors make erasure cheap, and time travel plus vacuum handles the compliance retention windows, while Databricks holds FedRAMP, HITRUST, HIPAA, and SOC 2 Type II certifications.

Iceberg V3 closes most of this gap. V3 deletion vectors use the same Puffin-based mechanism, making right-to-erasure a metadata update rather than a full file rewrite. The "compliance-first means choose Delta" guidance was a V2 recommendation, and for V3 deployments where engine support has caught up it's no longer a reason to prefer Delta, because the capability is now present in both. InMobi validated Delta at advertising-platform scale, while equivalent Iceberg V3 production references are still emerging (early 2026), so Delta's longer track record still matters for a risk-averse compliance posture.

Decision framework

How to choose.

Choose Apache Iceberg if

Multi-cloud strategy (AWS + Azure + GCP, querying across clouds).
Query engine flexibility (Trino ↔ Dremio ↔ Athena swaps without data migration).
Vendor neutrality is critical (preserving exit strategy).
Partition evolution is likely (unpredictable data volume growth).
Analyst accessibility matters (hidden partitioning reduces training burden).
Write-once workloads dominate (immutable security logs, audit trails, compliance archives).

Production validation: Netflix (5 PB/day), Insider (90% cost reduction), AWS S3 Tables (managed service).

Choose Delta Lake if

Databricks-committed (5+ year Unity Catalog roadmap, MLflow threat detection, existing Spark investment).
CDC-heavy workloads with a mature production-track-record requirement.
Spark-dominant architecture (ETL, batch, Spark Structured Streaming).
Compliance-first operations with risk-averse posture (longer production deployment behind erasure features).
Best-in-class Spark performance matters (1.7–3.5× faster for Spark workloads).

One V3-era note on this list: items 2 and 4 used to be unambiguous "choose Delta" categories, but with Iceberg V3 both become "choose Delta only if you specifically need the longer production track record," because the underlying capability gap has closed. Items 1 and 5 (Databricks ecosystem commitment, Spark-specific performance) are still reasons that V3 doesn't change.

Or use both via Unity Catalog

You can run Delta Lake for CDC tables (user inventory, asset tracking) and Iceberg for immutable logs (CloudTrail, firewall, EDR) under a single Unity Catalog governance layer, so the multi-format strategy eliminates the binary choice and lets you optimize per format without fragmenting governance.

2026 update

Databricks Lakewatch changes the Delta calculus.

In late March 2026, Databricks launched Lakewatch: an open, agentic SIEM built on Delta Lake and Unity Catalog, with AI agents powered by Anthropic Claude. Partners include Cribl, Palo Alto Networks, Okta, Wiz, and Zscaler. This changes the competitive landscape for the Iceberg vs Delta decision in one specific way, which is that choosing Delta Lake now comes with a security-specific product ecosystem. If your organization is Databricks-committed and evaluating SIEM alternatives, Lakewatch means your lakehouse table format and your security detection platform share the same foundation of Delta tables, Unity Catalog governance, and Spark compute, and that shared foundation is a genuine integration advantage.

The flip side is that this deepens the Databricks lock-in case against Delta, because when your table format, governance catalog, compute engine, and security detection platform all come from one vendor, the exit costs compound. Databricks markets Lakewatch as "zero vendor lock-in via open formats," but Delta Lake is Databricks-controlled, and an "open agentic SIEM" built on a single vendor's stack is a different kind of open than Apache Iceberg queried by 15+ independent engines. This is Tier D evidence (vendor launch, no production validation), so treat it as a factor in your framework rather than a resolved answer.

Migration path

Start with Iceberg, evaluate Delta later.

Phase 1: pilot (months 1–3)

Deploy Polaris Catalog plus Iceberg tables for one or two data sources (firewall logs, CloudTrail). Validate query performance with StarRocks or ClickHouse. Measure operational complexity (compaction, snapshot expiration).

Phase 2: production expansion (months 4–9)

Expand to 10–20 data sources. Automate maintenance via Airflow or dbt plus Spark compaction jobs. Implement lifecycle policies (hot → warm → cold S3 tiering).

Phase 3: evaluate Delta (months 10–12)

If a Databricks adoption path emerges, pilot Delta for CDC tables, and if Unity Catalog looks attractive, adopt the multi-format strategy, while if Iceberg meets your needs you can continue Iceberg-only. Starting with Iceberg doesn't prevent Delta adoption later, because both coexist in Unity Catalog.

Conclusion

The table format chosen today determines tomorrow's flexibility.

Apache Iceberg and Delta Lake are both production-validated at petabyte scale for security-adjacent workloads, so the choice comes down to strategic priorities rather than technical limits.

Iceberg strengths: universal query engine compatibility (15+ engines), vendor neutrality, partition evolution, hidden partitioning for analyst accessibility.

Delta Lake strengths: Spark performance optimization (1.7–3.5× faster), CDC maturity, Databricks ecosystem depth, compliance certifications (FedRAMP, HITRUST, HIPAA).

My recommendation: start with Apache Iceberg for vendor neutrality and multi-engine flexibility. Evaluate Delta Lake if Databricks commitment emerges or CDC-heavy workloads dominate.

Unity Catalog's multi-format support (Delta + Iceberg + Hudi) means this isn't a permanent decision, so choose based on your organization's strategic priorities rather than on technical benchmarks alone.