Technology deep-dive

Splunk Federated Search: bridge or lock-in extension?

Splunk Federated Search exists because Splunk knows customers are moving telemetry to S3 and Iceberg. The question worth asking is whether it's a credible bridge architecture to vendor-neutral lakehouse operations, or a lock-in extension that delays the inevitable migration. The honest assessment is that Splunk has made real engineering investment, but the business model creates incentives against deep lakehouse integration. The version worth tracking is the one that shipped GA in Splunk Platform 10.4 on May 18, 2026 (covering Splunk-to-Splunk, S3, and Amazon Security Lake) and the broader Cisco Data Fabric architecture it now sits inside, not the GA-2024 product most coverage still describes.

Reading time: about 20 minutes. Evidence tier: A overall (Splunk product documentation, the Platform 10.4 release announcement, .conf25 Cisco Data Fabric launch materials, Forrester's .conf25 coverage) with one Tier D speculative note on where the still-alpha and beta connectors (Snowflake, Iceberg, Delta Lake, Azure, Cisco SAL) may land at GA. I have not independently benchmarked any version of the product against a Trino-on-Iceberg baseline; that work is on the lab roadmap.

Query pattern · federation

Federation queries data where it lives instead of copying it into one store, which avoids a duplication and pipeline bill but adds planning and cross-store coordination overhead. How much overhead is an open question: a toy-scale smoke test (1K rows per table) measured federation at about 9% of the gap between the fast and slow engine (roughly 36 ms), and most of that gap was the engines themselves, not federation; the production-scale magnitude is a medium-confidence, still-open assumption pending a controlled benchmark.

The strategic pivot

Why Splunk built federation in the first place.

For roughly fifteen years, Splunk's commercial model rested on a single architectural assumption: the right place for security telemetry is inside a Splunk index. The license was metered by daily ingestion volume, the proprietary indexed format was where the performance lived, and SPL (Search Processing Language) was the query interface analysts learned to think in. The model worked because nothing else could touch Splunk's interactive query latency at security scale, and because storage was expensive enough that "ingest everything" looked like a reasonable trade-off.

Two things changed that assumption. S3 object storage dropped to roughly $0.023/GB for the Standard tier, and to a fraction of that for Glacier-IR archives, meaning the raw telemetry could sit in cloud storage at one to two orders of magnitude lower cost than a Splunk index. Apache Iceberg matured into a credible lakehouse table format with ACID guarantees, schema evolution, and multi-engine compatibility, so the "but you can't query it interactively" objection started to dissolve. The result: customers running Splunk at scale began pushing the long-tail telemetry (DNS logs, network flows, cloud audit trails, EDR raw events) into S3 and querying it through Athena, Trino, or ClickHouse.

Federated Search is Splunk's response to that pattern. The product framing is that you shouldn't move all your data into Splunk and should instead query it where it lives, but the strategic framing is more interesting, because by offering federation Splunk keeps the analyst's primary interface (SPL, the saved searches, the dashboards, the runbooks) anchored to Splunk Cloud even when the underlying telemetry is in S3 or Snowflake or a customer-managed Iceberg lake, so the control plane stays Splunk-resident even as the data plane moves out.

That's the bet, and whether it turns out to be a bridge or an extension depends on whether the federation eventually opens up (true multi-engine catalogs, portable detection rules, an exit path that doesn't require rewriting three hundred SPL runbooks) or whether it stays a Splunk-centric query layer over open storage, because Splunk's engineering signals point one direction while the business model creates pressure in the other.

A force-directed graph resolving thousands of asset, identity, configuration, and vulnerability observations from separate sources into one connected attack-surface graph. — The payoff of querying across sources instead of copying them into one index: separate asset, identity, and vulnerability feeds resolved into a single connected attack-surface graph.

Evolution timeline

Five phases of Splunk federation.

The product has gone through five recognizable phases.

Phase 1: Splunk-to-Splunk federation (legacy)

The original federation was cross-environment search between Splunk deployments. If you ran Splunk Enterprise on-prem and Splunk Cloud, you could query both from a single SPL interface. This was Splunk-to-Splunk only and didn't constitute true multi-platform federation; it was an artifact of the deployment topology rather than a strategic capability. Still, it established the architectural pattern: Splunk as the query head, multiple indexers as the data plane.

Phase 2: Federated Search for Amazon S3 (GA 2024)

The first real multi-platform federation let you query S3 buckets directly from Splunk Cloud, with AWS Glue Data Catalog providing schema. The performance shape (approximately 100 seconds per terabyte scanned) is consistent with what you'd expect from a row-oriented query head pulling against columnar storage with no pushdown optimization, so it's useful for low-frequency compliance access and historical investigation but not for real-time detection.

Phase 3: Federated Analytics for Amazon Security Lake (GA 2024)

This phase added native OCSF schema support and a dual-mode design that runs federated search for cold data and selective indexing for hot data, and it was the first version that acknowledged the open-schema layer (OCSF) directly. The dual-mode pattern (index what's queried often, federate what's queried rarely) is the architecturally right call, though the economics still steer customers toward indexing more rather than less, which I'll come back to.

Phase 4: Cisco Data Fabric and the Machine Data Lake (.conf25, September 8, 2025)

Post-Cisco acquisition, the product framing widened. At .conf25 on September 8, 2025, Cisco announced Cisco Data Fabric, an architecture in which Splunk Federated Search and what Splunk calls a Machine Data Lake are presented as two layers of one strategy rather than as a standalone federation feature. The Machine Data Lake is positioned as a virtual lake spanning federated sources across Apache Iceberg, Delta Lake, Snowflake, and Azure storage, and AI capabilities arrived via a Time Series Foundation Model. The vendor framing, quote: "it's a virtual lake, not a physical one," is a useful tell, because it signals that Splunk isn't trying to be the storage layer but the metadata-and-query layer that sits on top of whichever storage you already have. Forrester's .conf25 write-up reads the same way, since the federation story only makes sense as a component of the Data Fabric rather than as a product line of its own.

Phase 5: Splunk Platform 10.4 GA, Snowflake and lakehouse connectors (2025–2026)

Splunk Platform 10.4, released May 18, 2026, is the milestone that matters here. It carries Federated Search for Splunk-to-Splunk, S3, and Amazon Security Lake to GA in a single release, and adds two differentiators worth naming on their own terms: AI-driven schema inference (the engine attempts to infer schemas for federated sources rather than requiring up-front Glue or catalog work) and intelligent routing between hot search and cheap storage (the query planner decides whether a slice belongs in indexed hot tier or cheap object-storage scan).

The connector roadmap around 10.4 is staged.

Connector	Stage (as of May 2026)	GA target
Snowflake	Alpha (entered December 2025)	July 2026, on Splunk Cloud AWS commercial
Apache Iceberg, Delta Lake, Azure	Beta	Not stated
Cisco SAL (Security Analytics Lake)	Alpha (entered January 2026)	Not stated

Each of those is a separate maturity step rather than one bundled "2.0," so buying decisions made today should be calibrated against the specific connector's stage, which is GA for S3 and Security Lake but beta or alpha for everything else.

Technical reality

What the shipping product actually does.

Performance constraints

Federated Search for S3, as it has shipped from GA-2024 through the Platform 10.4 release, is materially slower than native indexed Splunk search across every workload shape I've seen documented. Simple filters that return sub-second from a native index take roughly 100 seconds per terabyte scanned in federated mode, and complex aggregations move from 1–10 seconds native into the multi-minute range when federated. Real-time alerting is not supported in federated mode at all, and it can't be, because the query path doesn't have a streaming hot tier. Splunk 10.4's intelligent routing between hot search and cheap storage helps at the planner level (the engine decides which slice goes where), but it doesn't change the underlying object-storage scan economics for queries that do land in the federated path.

The implication is structural, because for real-time detection you still need a Splunk index, and federation works as a complement for low-frequency historical work rather than a substitute for the hot-tier ingestion pattern. That would be a reasonable architectural answer if it were priced like one, but the DSU pricing I'll come back to suggests it isn't.

Russell Leighton, Panther's chief architect, drew the line the same place I would in a May 2026 LinkedIn post, and more concretely. His read is that federated SIEM queries are useful for enrichment and secondary log sources but not for primary critical data, because a SIEM built entirely on federation is "as slow as the slowest source," often can't control performance or retention, and runs into tight upstream API limits he warns get "very bad during an incident." That's the practitioner articulation of why federation bridges without replacing, because the control you give up when you federate (performance, retention, the ability to not be rate-limited by an upstream API) is exactly the control an incident responder needs at 2 a.m., so federation is fine for the query you can afford to run slowly while it's the wrong place for the query you can't.

SPL command restrictions

The federation surface doesn't support the full SPL grammar. Unsupported in federated mode:

datamodel, accelerated data models, the backbone of most enterprise correlation searches.
inputlookup, lookup table joins, common in threat-intel enrichment patterns.
Most generating commands, including the streaming variants.
Real-time search across federated sources.
Wildcards across federated indexes.

The practical implication is that detection rules written in SPL against indexed data may not run against federated data without rewriting, so a SOC that wants to federate its long-tail telemetry can't just point its existing correlation searches at the federated source and the analytics layer needs adaptation, which is a non-trivial migration cost that doesn't show up in the marketing pitch.

Platform constraints

Federation availability isn't uniform across Splunk's deployment footprint.

Deployment surface	Federated Search availability
Splunk Cloud on AWS	Federated Search for S3, and, as of Platform 10.4, Amazon Security Lake
Splunk Enterprise (on-prem)	Not available
Splunk Cloud in GCP regions	Not available
FedRAMP and IL5 (the federal and defense tenancies)	Not available

The Snowflake connector is alpha-only on Splunk Cloud AWS commercial through mid-2026, and the Iceberg, Delta Lake, and Azure connectors are beta, so for a mixed Splunk Enterprise plus Splunk Cloud topology federation only solves half the architecture, and the unsupported half tends to hold the regulated workloads.

The economics

Follow the money: DSU pricing and Splunk's own warning.

Federated Search is metered through Data Scan Units (DSU), separate from ingestion licensing. Each query consumes DSU proportional to volume scanned. The model is conceptually clean: you pay for the queries you run, not for storing data you rarely touch. The honest version, which Splunk states in its own documentation, is more constrained:

"Customers attempting to use this feature for high frequency searches will likely incur higher costs than natively ingesting and searching in the Splunk platform."

That's a vendor telling you, in writing, that the federation product is not priced for high-frequency use, which means federation is for the queries you don't run often. The pattern Splunk implicitly endorses is "index high-frequency detection data, federate low-frequency historical data," which is a reasonable architecture, but it's also the architecture that maximizes Splunk's ingestion revenue.

A directional cost framing:

Use case	Approach	Cost profile
Real-time detection	Native Splunk indexing	$$$ (ingestion + compute)
Threat hunting (ad-hoc)	Federated search	$$ (DSU per query)
Compliance audit (rare)	Federated search	$ (occasional DSU)
Historical forensics	Federated search	$ (rare access)

The pattern the price structure pushes you toward is to keep the hot 90 days in a Splunk index and federate the cold tail, which is a defensible architecture, but it's also the architecture that maximizes the value of the Splunk license you already have, and that's the part worth being honest about. A genuinely open federation model, one that priced ad-hoc queries close to the raw S3 GET-request cost, would compete with Splunk's own ingestion revenue, so DSU pricing avoids that competition by making frequent federated queries expensive enough to push the workload back into the index.

This isn't bad-faith pricing, because Splunk has real engineering cost in the federation engine and a reasonable claim to monetize it, and the issue is only that the incentive structure creates a ceiling on how deep the lakehouse integration may go. For how this interacts with the broader migration cost picture, see the hidden cost of SIEM migration. Federation-as-bridge is most useful for organizations whose SPL investment is large enough that a full migration is multi-year.

Where Splunk is genuinely open

OCSF and OpenTelemetry: real investment.

The lock-in critique only makes sense if it's calibrated honestly. There are layers of the stack where Splunk has made substantial open-standards investment, and those layers are worth naming before getting to the layers where lock-in persists.

On the schema layer: Splunk co-founded OCSF (Open Cybersecurity Schema Framework) with AWS, sits on the OCSF steering committee alongside IBM, and championed the transition to Linux Foundation governance. OCSF now lists over 200 participating organizations and 900-plus contributors and is the de facto open schema standard for security event data. Splunk's investment here is multi-year, sustained, and structurally aligned with customers who want portable event schemas, and it reads as foundational work rather than a defensive gesture.

On the telemetry-collection layer: Splunk has roughly twenty dedicated engineers contributing to OpenTelemetry, with more than a hundred thousand code contributions, and the OpenTelemetry Collector is the default agent for Splunk Observability Cloud. OTel is the open standard for telemetry collection across metrics, traces, and logs. Splunk's depth of contribution makes the OTel-to-Splunk path one of the most production-validated in the ecosystem.

The honest verdict on the open-standards layers is that Splunk has genuine commitment, and if you're building on OCSF and OTel as foundational standards (the recommended posture) Splunk's investment aligns with that direction. The lock-in critique below isn't about the standards layer at all but about what sits on top of it.

Where lock-in persists

SPL, the proprietary index, and the query plane.

Three layers where Splunk Federated Search doesn't reduce lock-in, and may extend it.

SPL creates skills lock-in

Detection rules in a mature Splunk environment are written in SPL, and they aren't portable. A Sigma-to-SPL translation path exists, but the inverse, SPL-to-Sigma, is largely manual effort because SPL supports operators and idioms that don't have clean Sigma equivalents. An enterprise with three hundred SPL-based runbooks faces a real rewrite cost to migrate detection logic to any other platform, and that cost compounds with each additional saved search the team builds.

Federation doesn't help here, because the detection rules still get written in SPL, and the federation product extends the surface area where SPL is the analyst's primary language without introducing a portable alternative. For organizations that take detection portability seriously (the right posture if you want a credible exit option), federation makes the skills-lock-in problem slightly worse rather than better.

Proprietary indexed format

Federated Search reads open formats (Parquet, ORC) from S3. The data Splunk has already ingested into its native index is in a proprietary columnar-ish format that no other tool reads directly. Exporting an existing Splunk index to Iceberg-resident Parquet requires an ETL pass (typically through Splunk DB Connect or a SPL-based export), and that pass carries data-quality risk. The federation product doesn't change this; what's in the index stays in the index.

The asymmetry is worth naming: Splunk reads open formats outbound (federation queries Parquet on S3) but doesn't write open formats inbound (ingestion lands in the proprietary index). A truly bidirectional bridge would let customers configure ingestion to write Parquet-on-Iceberg as the storage substrate, with Splunk's query head sitting on top. That's not what's shipped, and it's not on the public roadmap that I've seen.

This read-strong, write-weak shape is the most important pattern in the whole product, because it shows up again one layer up, where SPL federated search reads out across sources well enough while the write and control side stays engine-resident in retention policy, performance guarantees, RBAC, and schema authority. That is the same asymmetry I keep finding in the security-lakehouse catalog argument, an open table format and an open read path on one side while the write path and the metadata authority stay anchored to a single engine, so the lock-in that actually constrains an exit isn't catalog-anchored in either case but engine-anchored. Federation moves the read boundary outward and leaves the write boundary exactly where Splunk's commercial model needs it, which is why "we query open storage now" and "we have a credible exit" are not the same claim.

Splunk-centric query plane

Federation queries open formats on S3, but Splunk controls the query plane. The Machine Data Lake catalog, the metadata layer that knows what tables exist, what schemas they have, how they partition, is Splunk-managed. Other query engines can read the underlying Parquet files, but the catalog metadata isn't directly interoperable with open standards like the Iceberg REST Catalog API or Polaris. The result is a federation product that uses open storage formats while keeping the metadata and query coordination Splunk-resident, which isn't the same thing as multi-engine architecture.

The five-question test

What to ask before adopting Splunk federation.

Five questions I'd ask a Splunk account team before signing a federation deal. The answers tell you whether you're buying a bridge or extending the lock-in.

1. Exit strategy: portable detection rules

If we want to leave Splunk in three years, can we export detection rules in a portable format (Sigma, YARA-L, or vendor-neutral SQL)? Today the answer is no, because SPL rules don't export to Sigma automatically and the bidirectional translation tooling is still community-maintained rather than vendor-supported, and this is the lock-in test that matters most for long-term flexibility.

2. Multi-engine access

Can we query the same data lake with Trino, Dremio, or ClickHouse without Splunk? For raw Parquet in S3 the answer is yes because the storage is open, but for Splunk's Machine Data Lake metadata the answer is no, since whichever tables Splunk has registered in its own catalog are reachable through the federation engine but not through arbitrary external engines without rebuilding the catalog layer.

3. Detection portability

Are our rules locked into SPL or portable as Sigma rules? They're locked in, because Sigma conversion is manual effort and the conversion fidelity drops for complex correlation logic. Organizations serious about detection portability should be writing in Sigma and translating to SPL rather than the other way around, and that's a tooling-and-culture investment Splunk doesn't directly support.

4. True cost

What is the all-in cost including base Splunk Cloud licensing, federated search add-on, and DSU consumption at the query frequency we actually run? It's complex, and frequently underestimated by both customers and the account team, because DSU costs for high-frequency analytical queries may exceed the ingestion cost for the same data, which is what Splunk's own documentation warns about, so get the quoted cost in writing with a representative query workload rather than a benchmark scenario.

5. Catalog independence

Who controls the metadata catalog, and can we use Polaris, Unity Catalog, or AWS Glue as the source-of-truth catalog with Splunk reading from it? Today Splunk controls the Machine Data Lake catalog, and a federation product that operated as a query engine against an external Iceberg REST Catalog would be architecturally open, but the Platform 10.4 release doesn't work that way and the beta Iceberg connector hasn't publicly committed to external-catalog-as-source-of-truth either.

Alternative architectures

What the alternatives look like.

Four architectures compete for the federation slot, and naming them is how the assessment stays honest, because the question worth asking isn't whether federation is valuable but what it's valuable compared to.

Cribl Search (broader federation, vendor-neutral routing)

Cribl Search runs federated queries across more than fifty destinations, with native OCSF transformation and a platform-agnostic stance, and organizations report 40–50%+ Splunk cost reductions when they route the long-tail telemetry through Cribl rather than ingesting it. The trade-off is that Cribl is itself a vendor and the Cribl pipeline becomes a central control plane, though Cribl's commercial incentive is aligned with helping customers shrink their Splunk bill, which runs opposite to Splunk's incentive with federation.

Trino over Iceberg (direct lakehouse queries)

Trino is the SQL federation engine that originated at Facebook (as Presto) and now powers ad-hoc analytical queries at Netflix, LinkedIn, and Pinterest scale. Pointed at OCSF-formatted Parquet sitting in Iceberg tables, Trino delivers interactive query latency for analytical workloads with no vendor in the query path. The strength is true vendor neutrality, multi-engine catalog support via Iceberg REST or Polaris, and no per-query metering beyond the underlying compute, while the weakness is that it has no native detection logic, since Trino is an analytical engine rather than a SIEM, so it complements rather than replaces the detection plane. See Iceberg vs Delta Lake for security data for the table-format side of this argument.

Amazon Security Lake (OCSF-native AWS path)

Security Lake is AWS's OCSF-native managed lake, with storage at roughly $0.035/GB and query access through Athena. The strength is that there's zero infrastructure to operate, a native OCSF schema, and a direct path to AWS-native analytical tools, while the weakness is that it's AWS-centric, so multi-cloud security operations need a secondary path, and the Athena query path has its own performance shape that may not match sub-second SOC latency expectations.

ClickHouse (specialized analytical backend)

ClickHouse is an open-source (Apache 2.0) columnar OLAP engine optimized for high-cardinality analytical workloads. The independent benchmarks I've seen suggest 10–100× query performance over row-oriented engines for SOC-shaped workloads, though those benchmarks come from ClickHouse community sources and warrant skepticism until reproduced with disclosed methodology. ClickHouse doesn't replace a SIEM (it has no native detection logic), but it's a credible specialized backend for the analytical and threat-hunting tier sitting next to or under a detection platform. Huntress validated ClickHouse at three million endpoints with a reported 93% cost reduction relative to their prior architecture.

Whichever engine you reach for, validate cross-engine answer-equivalence at your own scale before you trust an engine swap, because two engines reading the same Parquet can return different answers to the same query and the divergence is easy to miss (the silent wrong answer works through how), and the ClickHouse-vs-DuckDB bench is where I run that check.

When federation makes sense

Honest recommendations: when to use it, when not to.

Splunk Federated Search may be the right call when

You have a large existing Splunk investment, meaning three hundred or more SPL-based runbooks, mature correlation searches, several years of analyst familiarity. The skills-lock-in cost is real, and federation lets you start drawing down the ingestion bill without rewriting the analytics layer.
You run multiple Splunk deployments (Splunk Cloud plus Splunk Enterprise) and need unified search across them. Splunk-to-Splunk federation has been the most reliable use case since Phase 1.
Your organization has standardized on Amazon Security Lake as the long-tail storage tier. The Phase 3 OCSF-native integration is the cleanest path here.
Data sovereignty rules require distributed deployment (data must remain in a specific cloud region or sovereign tenant). Federation lets you query across topology that ingestion can't consolidate.
Your SOC analyst pool is SPL-fluent and a mid-term migration away from SPL is not feasible. Federation extends the SPL surface area to data that previously wasn't queryable from Splunk at all.
The dominant federation workload is genuinely low-frequency: compliance audits, occasional forensic deep-dives, post-incident historical investigation. The DSU pricing makes economic sense at low query rates.

Splunk Federated Search is insufficient when

True vendor neutrality is the design goal. The query plane stays Splunk-resident regardless of how open the storage is.
Multi-cloud federation across AWS, Azure, and GCP is required. The Platform 10.4 GA surface is AWS-centric (S3, Security Lake, Splunk-to-Splunk); the Azure and Snowflake paths are beta and alpha respectively as of May 2026, and there is no public GCP federation timeline.
High-frequency analytical queries against historical data dominate. The DSU pricing makes this uneconomic relative to either keeping the data indexed (more expensive ingestion) or moving to a direct-lakehouse-query architecture (lower per-query cost).
Portable detection rules (Sigma-first) are a strategic commitment. Federation doesn't help with detection portability and may extend SPL lock-in.
Open-source preference (Apache-licensed stack) is a procurement constraint. The Splunk core isn't open-source, and federation doesn't change that.

For the reference architecture I work with (OCSF-native ingestion, Iceberg storage, multi-engine query layer with at least Trino or Dremio plus a specialized real-time engine like ClickHouse), Splunk Federated Search is not the right primary federation tool. It's the right complement if a large Splunk SPL investment already exists and the migration is gradual. The detail of how that complement-mode integration looks is captured in the Splunk federated integration methodology.

The Cisco Data Fabric question

What the alpha and beta connectors may change.

The Platform 10.4 GA covers Splunk-to-Splunk, S3, and Amazon Security Lake. The interesting work, the connectors that decide whether Cisco Data Fabric becomes a multi-engine architecture or a Splunk-resident query layer over open storage, is staged behind alpha and beta gates through 2026: Snowflake (GA target July 2026 on Splunk Cloud AWS commercial), Iceberg / Delta Lake / Azure (beta as of May 2026), and Cisco SAL (alpha since January 2026). The questions worth tracking as each of those matures are the same ones that distinguish bridge from lock-in extension.

Does the Iceberg connector support reading from an external Iceberg REST Catalog (Polaris, Unity Catalog) as the source-of-truth, rather than requiring Splunk's Machine Data Lake catalog to own the metadata? This is the catalog-independence question, and it's the single biggest architectural signal in the Data Fabric story.
Does the 10.4 generation close the performance gap with native indexed search to the point that "federate everything, index nothing" becomes economically credible? The shipping product is roughly 100× slower than native; if intelligent routing plus AI-driven schema inference brings practical latency closer to 2–5× while DSU pricing comes down proportionally, the bridge case strengthens materially.
Does Cisco Data Fabric ship a credible SPL-to-Sigma translation path, or at least first-class Sigma-as-input support for detection rules running against federated data? This is the detection-portability question, and a "yes" here would change the lock-in calculus.
Do the connector betas eventually widen platform coverage to Splunk Enterprise, GCP regions, and FedRAMP / IL5 environments? The current platform constraints are arbitrary from a technical standpoint; closing them is a strategic choice, not an engineering one.

I am marking the connector trajectory as Tier D evidence, speculation against alpha and beta products without published technical detail. Splunk has the engineering capability to ship all four of these capabilities; whether the business model permits it is the open question. My base case, calibrated on Cisco's broader Data Fabric strategy and Splunk's revenue mix, is that the connectors will widen platform coverage and improve performance materially but will not open the catalog or commit to detection portability. I would like to be wrong. The signal I'll watch for is whether Polaris-compatibility shows up in the Iceberg connector's GA notes.

The verdict

A bridge from pure ingestion, not a bridge to vendor neutrality.

Splunk Federated Search is a bridge from the pure-ingestion model to a hybrid model where part of the telemetry sits outside Splunk's index, but it isn't a bridge to vendor-neutral lakehouse architecture, and the difference matters because each architecture optimizes for a different exit path, since the hybrid model preserves Splunk as the analyst's primary interface while the vendor-neutral lakehouse model preserves the option to swap the analyst's primary interface later.

On the positive side, Splunk acknowledged that "ingest everything" is economically unsustainable and shipped a federation product that lets customers query data they couldn't reach before, the OCSF and OpenTelemetry investments are genuine and material, and the Phase 2 through Phase 5 expansion has been steady engineering work rather than vaporware. The limitation is that SPL skills lock-in remains and may be extending, Splunk controls the query plane and the catalog, and high-frequency analytical workloads are pushed back into the indexed-ingestion model by DSU pricing, so the product isn't a true multi-engine architecture but a Splunk-centric query layer over open storage.

For organizations whose strategic goal is a security data commons (shared telemetry, cross-organizational data sharing, portable detection logic, multi-engine catalogs), Splunk Federated Search is a step in the right direction at the storage layer and a step sideways at the query layer. Organizations seeking architectural freedom will find more complete solutions in OCSF-native lakes (Security Lake, customer-managed Iceberg), open query engines (Trino, Dremio, ClickHouse, DuckDB), vendor-neutral routing layers (Cribl Stream and Search), and portable detection languages (Sigma).

Splunk Federated Search optimizes within the Splunk ecosystem without escaping it, so whether that's the right product for a given organization depends on whether the goal is ecosystem optimization (in which case federation is well-engineered and worth using) or ecosystem exit (in which case federation is the wrong tool because it makes the SPL surface larger rather than smaller). The Iceberg, Snowflake, Azure, and Cisco SAL connectors maturing through 2026 may shift this verdict if catalog independence or detection portability lands, though the Platform 10.4 GA surface as it stands today doesn't.

I built the OCSF-native-lake side of that contrast to see what it actually buys, and ran it on the MOAR reference stack rather than leaving it as a thought experiment. Two normalized OCSF sources, an Authentication table and a Network Activity table, sit in one store, and a single SQL join on src_ip surfaces 198.51.100.66 as the same actor behind 8 failed authentications and 5 RDP connections, the brute-force-then-lateral-movement shape that neither source shows alone: the auth table sees the failed logins, the network table sees the RDP, and only the cross-source join resolves them to one IP. That is the well-connected pillar made concrete, and it's the move federation makes awkward, because the query still spans separate per-tool surfaces rather than one resolved entity, so a per-tool SIEM fragments the attacker across screens while in the lake the entity resolves once and every source is just another table to join. The relative pattern is the finding, not any wall-clock number, since this ran on a single host; the resolution-in-one-join property is what's hard to retrofit onto an ecosystem you're optimizing within rather than exiting.

I ran the storage and answer side of the same contrast against an OpenSearch schema-on-read SIEM as the open foil, loading 200,000 OCSF Network Activity events into both and asking the same questions: a row count, a needle (dst_port=3389, present 25,000 times), and a group-by. The two stores returned identical answers on all three, which is the result I most wanted to confirm, because an engine swap is only safe when the answers match. On storage the lakehouse held the events as 1.6 MB of Parquet against the SIEM index's 11.5 MB, so the schema-on-read index carried 7.0× the columnar footprint for the same data, and the lakehouse was faster on all three queries. The honest framing of the latency: this was one host, the OpenSearch index was queried over HTTP while DuckDB ran in-process so the SIEM paid a round-trip the lake did not, and a term index's real advantage shows up on highly selective needles at much larger scale than this run reached, which this setup doesn't isolate; the findings that hold independent of host and scale are the answer-equality across both stores and the 7.0× storage ratio.