Security Data Works

Methodology

Splunk federated integration with the MOAr stack

Integration pattern for organizations with deep SPL investment that need lakehouse economics on 30+ day data. Keep Splunk as the analyst UI; route high-value signals to Splunk indexers (30-day hot) and all data to Iceberg (1–3 year cold); bridge them with DB Connect (production, on-prem + cloud) or Federated Search (Splunk Cloud only, beta). Analysts keep SPL; storage cost goes to lakehouse rates.

3.6–10.1×

Speedup vs. the schema-on-read SIEM foil (OpenSearch 2.18.0, 2.854 s) on the canonical sdw-lab zeek-flagship-rerun (10M Zeek events, single host, Tier B): Trino 3.6× (0.795 s), StarRocks 8.3× (0.343 s), ClickHouse-on-Iceberg 10.1× (0.282 s); ClickHouse native's single-query best case (the old "145×" headline) is superseded by the CV-gated OpenSearch re-run, which puts ClickHouse-native at 46.8× on the five-query average (21–62× on the hunting-shaped queries). Methodology and code public; the schema-on-read tier degrades roughly 8× from 1M → 10M events. The win is the tier move, so pick the cold-tier engine on catalog maturity, concurrency, and operating cost rather than on the spread between those multipliers.

The pipeline

  1. Route

    Cribl / Tenzir / Vector

    Dual-write: high-value to Splunk, all data to Iceberg

  2. Hot tier

    Splunk indexers (30 days)

    Real-time alerts; native SPL speed for recent data

  3. Cold tier

    Iceberg on S3 (1–3 years)

    OCSF-normalized; columnar Parquet; partition pruning

  4. Bridge

    DB Connect (JDBC) or Federated Search

    SPL query plane spans both tiers

  5. Query

    Trino / Dremio / StarRocks / ClickHouse

    Standard SQL on the cold tier; detection content portable

What composes, what’s brittle

  • Why this works. Splunk's schema-on-read parsing tax shows up on repeated dashboards; pre-normalize to OCSF and pay once.
  • DB Connect. Production-grade, on-prem + cloud; 2–5× slower than the schema-on-read SIEM's native search (single host, Tier B), 10–50× cheaper storage.
  • Federated Search. Splunk Cloud only, beta; direct Iceberg; DSU "use-it-or-lose-it" meter — high-frequency queries can cost more than ingestion.
  • SPL feature gaps. transaction, datamodel, inputlookup, real-time alerting on federated sources are not supported.
  • Best fit. Heavy SPL investment + long-retention compliance + analyst retraining cost too high to swallow at once.
  • What's brittle. Federated Search beta SLA; SPL → SQL translation gaps; multi-cloud catalog independence (Glue vs. Polaris).

Sources: SDW Splunk DB Connect benchmark, December 2025 (github.com/flying-coyote/splunk-db-connect-benchmark) · Splunk Federated Search for S3 documentation · Cisco Data Fabric announcement (Sep 2025) · Splunk .conf25 federation evolution

See how the pattern lands on your workload.

The matrix scoring that justified each reference architecture's tool choices is the paid deliverable. The benchmark behind it is public — reproduce it on your own workload, then book a call to scope the work.