Methodology
Splunk federated integration with the MOAr stack
Integration pattern for organizations with deep SPL investment that need lakehouse economics on 30+ day data. Keep Splunk as the analyst UI; route high-value signals to Splunk indexers (30-day hot) and all data to Iceberg (1–3 year cold); bridge them with DB Connect (production, on-prem + cloud) or Federated Search (Splunk Cloud only, beta). Analysts keep SPL; storage cost goes to lakehouse rates.
Speedup vs. the schema-on-read SIEM foil (OpenSearch 2.18.0, 2.854 s) on the canonical sdw-lab zeek-flagship-rerun (10M Zeek events, single host, Tier B): Trino 3.6× (0.795 s), StarRocks 8.3× (0.343 s), ClickHouse-on-Iceberg 10.1× (0.282 s); ClickHouse native's single-query best case (the old "145×" headline) is superseded by the CV-gated OpenSearch re-run, which puts ClickHouse-native at 46.8× on the five-query average (21–62× on the hunting-shaped queries). Methodology and code public; the schema-on-read tier degrades roughly 8× from 1M → 10M events. The win is the tier move, so pick the cold-tier engine on catalog maturity, concurrency, and operating cost rather than on the spread between those multipliers.
The pipeline
-
Route
Cribl / Tenzir / Vector
Dual-write: high-value to Splunk, all data to Iceberg
-
Hot tier
Splunk indexers (30 days)
Real-time alerts; native SPL speed for recent data
-
Cold tier
Iceberg on S3 (1–3 years)
OCSF-normalized; columnar Parquet; partition pruning
-
Bridge
DB Connect (JDBC) or Federated Search
SPL query plane spans both tiers
-
Query
Trino / Dremio / StarRocks / ClickHouse
Standard SQL on the cold tier; detection content portable
What composes, what’s brittle
- Why this works. Splunk's schema-on-read parsing tax shows up on repeated dashboards; pre-normalize to OCSF and pay once.
- DB Connect. Production-grade, on-prem + cloud; 2–5× slower than the schema-on-read SIEM's native search (single host, Tier B), 10–50× cheaper storage.
- Federated Search. Splunk Cloud only, beta; direct Iceberg; DSU "use-it-or-lose-it" meter — high-frequency queries can cost more than ingestion.
- SPL feature gaps. transaction, datamodel, inputlookup, real-time alerting on federated sources are not supported.
- Best fit. Heavy SPL investment + long-retention compliance + analyst retraining cost too high to swallow at once.
- What's brittle. Federated Search beta SLA; SPL → SQL translation gaps; multi-cloud catalog independence (Glue vs. Polaris).
Sources: SDW Splunk DB Connect benchmark, December 2025 (github.com/flying-coyote/splunk-db-connect-benchmark) · Splunk Federated Search for S3 documentation · Cisco Data Fabric announcement (Sep 2025) · Splunk .conf25 federation evolution