H3-PERFORMANCE-01 · Tier A · 4.5/5
ClickHouse runs the same security workload 145× faster than the dominant schema-on-read SIEM.
On a 10-million-event Zeek workload, identical hardware and queries, ClickHouse Native completes the suite
in 0.19 seconds against the schema-on-read SIEM's 27.52 seconds. ZSTD-22 compression delivers 8.2× reduction
on top. Methodology and result are public; the reference implementation is shared under NDA with engagement
prospects and qualifying reviewers.
The counter-position runs along two lines. First: "ClickHouse isn't a SIEM" — true, but the comparison is
on query-engine performance against the same workload, not against the broader SIEM feature set. Second:
"the workload was selected to favor columnar storage" — the workload is a real Zeek deployment shape, but
the result generalizes most cleanly to security workloads dominated by a small number of recurring queries.
A workload dominated by ad-hoc full-text search across raw events would narrow the gap.
What would change the answer: an independent benchmark on a workload with a meaningfully different shape
where the schema-on-read engine closes the gap, or evidence that the ClickHouse result fails to reproduce
on different hardware.
H1-COST-02 · Tier B · 4/5
Modern security data platforms cut downstream licensing costs 50–80%.
Across fifteen-plus independent sources — Gartner advisories, Forrester reports, practitioner case
studies, vendor-customer testimonials with verifiable specifics — the consistent finding is a 50–80%
reduction in downstream SIEM licensing when a security data pipeline platform (Cribl, Tenzir, Vector, or
equivalent) sits between source telemetry and the SIEM. The savings come from filtering, downsampling, and
routing decisions made at ingest rather than after the per-GB clock has already started running.
The counter-position is that "downstream licensing" isn't the only cost — the pipeline platform itself
carries an operational cost, and engineering time spent on routing logic is real spend. That's correct;
the 50–80% is the licensing line item, not the all-in TCO. Most engagements still come out net-positive
because the SIEM licensing line was the unforced error.
What would change the answer: a SIEM vendor moving to consumption pricing that prices the offset through,
or a pipeline platform pricing model that captures the savings on its own side.
H1-PLATFORM-01 · Tier A · 4.5/5
Iceberg + Dremio + Polaris is the strongest open-stack baseline for security data.
Apache Iceberg as the table format, Dremio as the query engine, and Polaris as the catalog (the metadata
layer that tells the engine what tables exist and where their data lives) is the configuration with the
most production-deployment evidence behind it. Netflix runs Iceberg at multi-petabyte scale; Insider
reports a 90% reduction in S3 storage cost after migrating onto this stack; Apple, AWS, Databricks, and
Snowflake have all announced first-class Iceberg support since early 2025.
The counter-position: Delta Lake (the Databricks-led table format) has comparable or better tooling
inside the Databricks ecosystem; Apache Hudi has its adherents in heavy-streaming use cases. The
recommendation is workload-conditional, not categorical — but for the security-data workloads this practice
most often sees, the Iceberg stack ends up with the strongest combination of production validation and
engine portability.
What would change the answer: convergence of the table-format ecosystem (an active hypothesis on its own —
see contradictions below), or evidence that a Delta-native security-data deployment outperforms the Iceberg
equivalent on a comparable workload.
H-AI-ASYMMETRY-01 · Tier A · 4.8/5 · Settled
AI-enabled offense is maturing 2–3× faster than AI-targeted defense.
Anthropic's Claude Mythos preview demonstrated autonomous vulnerability discovery at a 72.4% exploit
success rate on the Firefox SpiderMonkey engine — fully autonomous, no human guidance per target. The
AISLE response replicated the result with eight small open-weight models in zero-shot API calls, settling
the working assumption that the moat is the system, not the frontier model. Mandiant M-Trends 2026
records negative mean-time-to-exploit: exploitation is now landing seven days before patch
release. CrowdStrike clocks attacker breakout at 51 seconds from initial access to lateral movement.
GTG-1002 — the first publicly confirmed AI-orchestrated state-sponsored campaign — was 80–90% AI-executed
across roughly 30 victims.
The counter-position is that defenders also benefit from AI tooling, and the asymmetry is temporary. Both
parts are true. The asymmetry is temporary — the working window is roughly 2024–2027. But "temporary" on
this scale means defenders need to compress years of detection-engineering modernization into a few
quarters, and the maturity gap between offensive and defensive AI tooling is wide enough that I treat
this as a planning constraint, not a watch-list item.
What would change the answer: a defensive AI breakthrough that produces measurable detection-cadence gains
across multiple production deployments, or evidence that the offensive results don't generalize beyond the
Mythos / AISLE benchmark workload.
H-COST-09 · Tier A · 5/5
Tiered storage cuts 55–90% of long-tail security data cost.
Splitting security data across hot, warm, and cold storage tiers — with hot kept on local high-performance
storage, warm on cheaper attached storage, cold on S3 Standard, and archival on S3 Glacier or equivalent —
reduces the long-tail cost of retention by 55–90% in production environments. Netflix reports 70–80% of
their storage in cold tiers; Insider documents a 90% S3 cost reduction after tiering; Kafka 3.0+ supports
tiered storage natively, which has materially changed the operational story for streaming security
telemetry.
The counter-position is the freshness trade-off. Warm and cold tiers have higher first-byte latency than
hot tiers; queries that span tier boundaries pay a complexity tax. For interactive threat hunting on
recent data, the tiering boundary needs to land further back than for compliance retention. Most
engagements end up with a tier boundary at 7–30 days; some land at 72 hours where the analyst hunting
window is short.
What would change the answer: object-storage pricing collapsing to the point where tiering's operational
complexity isn't worth the savings, or query-engine improvements that make cross-tier queries effectively
free.
H3-INTEGRATION-03 · Tier B · 4/5
OCSF is the multi-vendor schema convergence that actually has a chance.
OCSF (the Open Cybersecurity Schema Framework) is the shared event-shape standard adopted by 180+
organizations, including AWS Security Lake, Cisco, Sumo Logic, IBM, Cloudflare, and many of the EDR
vendors. ITU-T Study Group 17 confirmed OCSF as the basis for forthcoming international standards work in
April 2026. The practical effect is that security data captured in OCSF format moves between tools without
translation overhead — a shift from the prior decade's per-vendor schema lock-in.
The counter-position: vendor adoption claims at the petabyte-per-day scale lack independent verification
(this is one of the documented contradictions below). The standard is real and adoption is broadening; the
production-scale claims are softer evidence than the standardization trajectory itself. Caveat applied
where load-bearing.
What would change the answer: a competing schema gaining vendor consortium momentum, or independent
verification of the petabyte-scale OCSF deployment claims (either confirming or refuting them would update
the recommendation).
H-IMPL-01 · Tier B · 4/5 · Caveat
Streaming architectures cost 2.5–3× more to operate than batch equivalents.
Real-time streaming architectures incur 2.5–3× higher operational costs than equivalent batch
architectures, broken down across specialized staffing (DORA reports a 2.7× staffing premium for streaming
competence), infrastructure redundancy (1.5–2× costs from running always-on Kafka and Flink clusters
rather than scheduled spot batch), and incident management complexity (3–4× higher annual incident rates,
per IDC and Enterprise Data Quarterly tracking).
Caveat: the underlying evidence comes from general data engineering deployments, not
security-specific TCO studies. Security workloads have particular shapes — high-cardinality entity
resolution, bursty incident-driven query loads, regulatory retention requirements — that may shift the
ratio in either direction. A security-specific TCO comparison (streaming SIEM vs batch lake on the same
workload) is on the research backlog.
The counter-position is that detection latency requirements force streaming for some workloads regardless
of cost. True; the implication isn't "don't stream" — it's "stream what needs streaming, batch the rest,
and don't pretend the operational cost differential isn't real."
What would change the answer: a published security-specific TCO comparison — either confirming the general
data-engineering finding or showing the security workload shape narrows the gap.
H-NDR-FEDERATION-01 · Tier B · 4/5
Federated search architecture determines NDR platform stickiness.
The network-detection-and-response (NDR) market is consolidating around platforms that can federate
search across multiple data sources without forcing centralization first. The argument is capability-led,
not cost-led: federated query enables cross-site joins, data sovereignty (EU data stays in EU
jurisdictions), 10–145× query performance against the right data platform, and 93–99.9% wide-area-network
traffic reduction by querying data where it lives rather than shipping it home. Centralized SIEMs cannot
deliver any of these at any price.
More than 50 federated security implementations are publicly documented. ExtraHop is an early Security
Lake federation partner; AWS Security Lake plus Athena is becoming the lowest-effort bridge for shops
already on AWS; CrowdStrike has not yet shipped a standardized federation API, which is a competitive
opening for the platforms that have.
What would change the answer: a centralized SIEM vendor solving the cross-site / data-sovereignty problem
without forcing centralization (no current evidence this is happening), or evidence that federation
performance overhead at scale is worse than the early benchmarks suggest.