Security Data Works

Independent practice

The benchmarks vendors won't run.

An independent security-data-engineering practice. I benchmark the platforms vendors won't, publish the method and code, and use the evidence to move teams off Splunk-era SIEMs onto open architecture they own.

The real question was never which single product replaces Splunk; it's how you compose open data-engineering tools around each workload, with evidence for which engine does which job well.

Compose it yourself

Build an open stack, then make it prove itself.

This is a miniature of the component picker from the console I bring to an engagement. Choose one tool for each layer and the architecture composes underneath; every choice links to the benchmark or essay behind it, so you can follow the whole argument by building a stack instead of reading about one.

Ingest · collect & shape

Why Vector →

Storage · object store

Why SeaweedFS →

Catalog · table metadata

Why Polaris →

Query · engine(s)

Why DataFusion →

Schema · the contract

Why OCSF →

Composed architecture

Sources Vector SeaweedFS Polaris DataFusion

Schema contract across the flow: OCSF

Data-health gate

  • Config integrity (compatible selection) pass
  • Spec persisted pass
  • Layer 1 — source health unmeasured
  • Layer 2 — stack reachable unmeasured
  • Layer 3 — data-quality audit unmeasured
  • Layer 4 — cross-tool gap analysis unwired

In the working console, deploy would be permitted here, but the gate can't certify GREEN. Config integrity holds; the measurement layers haven't run, and on this page there's no live stack to measure. unmeasured and stale states are never shown as a pass — a green gate is earned by measurement, not asserted.

See the measurements in the Lab →

Illustrative — your real numbers come from the Lab or an engagement. The working console runs against a live Docker stack and lets you compose several ingest and query engines at once; the full Capability Matrix weights each choice to your workload.

The lab · proof on this page

Every benchmark is yours to re-run.

Zeek analytical workload · 10M events · 5-query average · single-node Tier B

46.8× faster

ClickHouse native vs. a generic schema-on-read SIEM, five-query average on an identical 10M-event workload — answer-equality verified. The 21–62× spread tracks query shape, since the index wins the simple lookups while the lakehouse wins the high-cardinality hunting aggregations where partition pruning and columnar scans pay off. Single-node Tier B (32 GB RAM, 16 cores); full per-query breakdown and methodology in the lab.

ClickHouse Native 0.06 s
ClickHouse + Iceberg 0.28 s
Schema-on-read SIEM 2.85 s

Reproducible Docker lab · methodology PDF public · full code and per-query data shared during engagement scoping

Who this is for · start here

Find the path that fits the problem you have.

  • An architect evaluating life off Splunk: start with the Capability Matrix and the head-to-head evidence in the Lab.
  • A security leader facing a SIEM renewal: the Migration Assessment scopes what moving would actually take.
  • A team drowning in detection maintenance: read DetectFlow, where the detection-engineering load moves off the analyst.

Why now

Why the SIEM model is breaking.

01

Attackers are faster than your detection cadence.

Mandiant's M-Trends 2026 shows exploitation landing, on average, 7 days before patch release. CrowdStrike's 2026 Global Threat Report clocks the fastest recorded attacker breakout at 27 seconds. The AI tooling behind that pace no longer needs a frontier model: the AISLE response replicated Anthropic's autonomous-exploit result with eight small open-weight models in zero-shot API calls.

02

Query performance has flipped.

On a 10M-event Zeek workload, ClickHouse runs the hunting-shaped aggregations 21–62× faster than the dominant schema-on-read SIEM (the Splunk-style model that indexes every field at search time, which is what makes large historical queries slow) — 46.8× on the five-query average, though the index actually wins the simple lookups. Answer-equality verified, single-node Tier B. Same data, same hardware, same queries; methodology in the lab, deeper walkthrough in ClickHouse at petabyte scale. The architecture schema-on-read indexing was sized for is gone.

03

Storage cost has flipped too.

Object storage plus columnar formats (which store each field down a column instead of row by row, so a query reads only the fields it needs) compress 8.2× in our benchmark. Netflix, Huntress, and Insider run multi-petabyte security data lakes on costs SIEM customers can't access. The tradeoff: data freshness.

04

Stream processing closes the freshness gap.

Modern stream engines handle thousands of near-real-time detections in the same time SIEMs handle dozens. The next move, federated query over source-retained data, is closer than vendors will admit.

The wire protocol

Engine portability is only real if the driver is too.

An aside for architects; if you're scoping the business case rather than the plumbing, you can skip to the proof below.

ADBC replaces JDBC and ODBC, the database drivers from 1992 and 1997, with a columnar-native one. DuckDB reports a >90% query-time reduction on analytical workloads (Tier B; the gain concentrates on wide tables and large result sets where row-by-row serialization dominates, not on every query). It's the layer that lets you swap query engines without rewriting the analyst's tool stack.

Arrow and ADBC: the columnar wire protocol →

>90%

less query time, ADBC vs JDBC/ODBC on analytical workloads (DuckDB, Tier B)

The product

The Capability Matrix.

A scoring matrix for security data tools, weighted to your workload. Public methodology and candidate catalog; the engagement-internal weighted scoring is the paid asset.

Its hard evidence is the Lab. Reproducible head-to-head benchmarks, methodology and code in the open, re-runnable on your own data. The score is measured, not asserted.

Refreshed quarterly · benchmarks re-runnable from public methodology.

The reasoning behind the scores lives in the long-form writing, and the open questions still being worked sit in the research notes.

See the matrix →

The operating frame

Disclosure-forward.

No reseller margins. No vendor-paid placements. Active partnerships disclosed in SOW Appendix B before any engagement begins.

The Matrix evaluates against the disclosure, not around it. An active partnership with one vendor doesn't move the score of a competitor. Compensation, placement, method, and review are spelled out as four operating commitments.

Annual external review of published results · first review Q4 2026.

Read the four commitments →

Ready to put the numbers to the test?

Every benchmark ships with public methodology and code, ready to reproduce on your own workload. A 30-minute call scopes which engagement fits.

Not ready to talk yet? Subscribe for new benchmarks and writing as they publish.