Security Data Works

The Capability Matrix

A scoring matrix for security data tools.

Moving off Splunk means choosing a storage format, a query engine, a catalog, and several more parts, and every vendor claims to be the best at all of them, so the Matrix scores the real candidates against your actual workload instead of their pitch.

Candidate tools scored per component, weighted by your workload. Public methodology and catalog below. The scored version, with reasoning, the vendor-claim-vs-shipped-reality deltas (what a vendor sells against what the product actually does once you run it), and recommended bundles, is delivered inside an engagement.

The Capability Matrix method: five candidate products scored across components on a 1-poor to 5-best scale, with the component set (Lakehouse, Catalog, Engine, Route, Graph, Storage) and the public-method versus paid-engagement boundary. Cells are illustrative, not real vendor scores.
How the scoring works — illustrative cells, generic product labels. The real per-vendor scores are the paid deliverable.

Read the v1 scoring writeup → — per-archetype orderings, five validation patterns, refresh triggers. 2026-05-25.

Evidence and audit state

Most cells currently sit at evidence Tier B–C. Five first-party benchmarks are now published, four of them with public code you can rerun, and the headline is the ClickHouse 46.8× Zeek run: on a 10M-event Zeek workload, ClickHouse runs the hunting-shaped aggregations 21–62× faster than a schema-on-read SIEM (46.8× on the five-query average; the index actually wins the simple lookups) — answer-equality verified, single-node Tier B. The OLAP join leg has its first scored pass as of 2026-06 (Tier B, single host, 10M–60M-row tables): StarRocks measurably leads multi-table joins and ClickHouse the aggregation shapes, but every engine answered the SOC join suite in under 1.5 seconds, so at that scale engine choice rides on catalog maturity, concurrency behavior, and operational cost more than on join latency. The Tier-A upgrades are still the named Q3 2026–Q1 2027 bake-offs (catalog RBAC, multi-node OLAP federated-join, pipeline OCSF-lossiness), not claims already in hand. The lab's first external review is a Q4 2026 forward commitment; the reviewer is named on the lab page when it completes, not before.

Components

These seven components are the MOAR reference architecture. The matrix is the scoring view of the same vocabulary, not a second vocabulary. The architecture narrative, component by component, is on the MOAR thesis page. Component 0 (Platform Pattern) is the composed-vs-managed decision that frames the other six.

Methodology

Evidence → Matrix → recommendation

The Matrix is the destination every lab benchmark rolls up into. The lab does not publish numbers for their own sake; each reproducible result becomes a 1–5 score on a component criterion, the scores are weighted by your workload archetype, and the weight-adjusted totals produce a defensible recommended bundle. The benchmarks are the evidence; the Matrix is the decision the evidence supports.

How the Matrix works · evidence → recommendation

How lab evidence becomes a weighted Matrix recommendationSix public reproducible lab benchmarks — zeek-flagship two-regime, engine-join-specialization, concurrency-multiuser, workload-interference, cost-to-serve-retention and pipeline-ocsf-fidelity — fan into criterion scores of 1 to 5 per candidate. Those scores are multiplied by archetype weights that sum to 100 and are set by your workload, producing weight-adjusted totals and then a per-workload bundle with sequencing and reversibility, where the bundle is the paid product.LAB EVIDENCE(public, reproducible)MATRIX(the product)RECOMMENDATIONzeek-flagship (two-regime)engine-join-specializationconcurrency-multiuserworkload-interferencecost-to-serve-retentionpipeline-ocsf-fidelitycriterion scores 1–5per candidate× archetype weights(sum to 100, set byyour workload)weight-adjusted totalsper-workload bundle+ sequencing + reversibility(bundle = paid)
Public reproducible lab benchmarks become 1–5 criterion scores, multiplied by archetype weights that sum to 100, producing the weight-adjusted total that is the recommended per-workload bundle. A cell-winner is not the recommendation; the weighted total for your archetype is.

A cell-winner is not a procurement-defensible default; the recommendation is the weighted total for your archetype, not the best score on any single criterion. The worked example below shows the whole path on one component, and shows the winner change when the archetype changes.

A worked scorecard (illustrative)

One worked example of the method, end to end, on a single component, the Query Engine, for a Zeek-heavy SOC archetype (high-volume network time-series, long retention, sub-5 s p99 on hunting aggregations, a handful of concurrent analysts). The capability scores are grounded in the public lab (Tier B, single host) where the criterion is measured, and qualitative where it is client-specific. It is illustrative: the weights are an example archetype profile, not a client's, and the per-cell vendor-claim-vs-shipped-reality delta (the highest-value paid content) is not shown here.

Criterion Wt CH/Ice StarRk Trino
Analytical perf (hunting aggregations)20544
Cost-to-serve (compute $/effective-TB)18543
Concurrency / multi-tenant15453
Iceberg native vs connector12345
Operational simplicity12433
Routability (deterministic front end)8444
Semantic-layer / MV rewrite5342
Federation breadth4325
SPL→SQL dialect distance3344
Existing in-house skill base (client-specific)3433
Weight-adjusted total (max 500)100414392358

For this archetype the recommendation is ClickHouse-over-Iceberg (414), with StarRocks a close alternative (392). The gap is carried by the criteria where the lab actually separates the engines (analytical aggregation performance and compute cost) more than by any single headline. StarRocks wins the concurrency criterion outright (5 vs 4), which is the whole point of the next paragraph. The per-cell delta between each vendor's published claim and its shipped reality is the paid content; it rides every row above and is where an engagement earns its keep.

The winner changes with the archetype, the same cells under different weights, a mixing board rather than a leaderboard. Re-weight for a many-concurrent-analyst estate (concurrency 32, cost 15, perf 10, ops 11, routability 8, Iceberg-native 10, semantic 4, federation 4, dialect 3, skill 3) and the same cells re-total StarRocks 410 > ClickHouse 404: the ranking flips because the concurrency bench found StarRocks degrades most gracefully under load while the single-query latency edge that wins the Zeek-hunting archetype erodes to a throughput tie at the host CPU ceiling. No cell changed; the workload did. That is why the engagement scores against your archetype rather than publishing a single "best engine."

Honesty boundary: every score here is illustrative and single-host Tier B. At single-node SOC scale the engine spread is genuinely compressed (the join bench put the SOC suite under 1.5 s on all four), so the separation leans on the operational criteria (concurrency, cost-to-serve, Iceberg-native maturity), and the TB-scale / multi-node regime that would re-open the spreads is unmeasured and stays an explicit gap.

Augment vs replace — the decision path

The component scores answer which open stack to pick if you move; the decision path answers whether to move, and how far — the question a board votes on. The incumbent and the partial moves are scored candidate paths, not a foil: Stay (status-quo SIEM), Augment (keep the SIEM for hot/detection, offload cold retention to the lakehouse), Hybrid-tiered, and Full-replace. The breakeven where each path pays back is computed, not asserted — migration cost divided by the monthly cost-to-serve saving against staying — so a board sees the cheapest path at its own retention horizon and the month at which the call flips. Each recommendation carries a four-part board-defensibility read: the call, what being wrong costs (the reversibility kill-switch), the evidence tier behind the crossover, and the one assumption that flips it. And the recommendation is the lowest risk-adjusted cost: a path that silently breaks a meaningful share of your detections, or fails a mandatory compliance control like WORM, is ineligible to win on cost alone. The method is here; the per-client crossover and the scored gates are the engagement.

How it plays out

The methodology above is the discipline applied inside an engagement. The case studies page is where that work gets shown: right now it carries one close case study, an analysis of Atlassian's Project Banyan Databricks-on-OCSF deployment, with the headline numbers drawn from Atlassian's public Databricks story and DAIS talk. The reference architecture catalog generalizes the published patterns into reusable building blocks.

Beyond that one case study, the catalog carries published-architecture teardowns across the regulated sectors a matrix engagement tends to land in — analyses of named, on-the-record deployments, not SDW engagements: bank (Standard Chartered, DNB, Bank Hapoalim), MSSP (Ziggiz), healthcare (Yale New Haven Health), and SaaS (Cloudflare, Pinterest, Okta, Huntress, Comcast). The full teardown catalog shows each one's pipeline and what's measured versus vendor-reported.

Getting the scored matrix

The scored matrix is delivered inside an engagement, scored against your stack and your workload. The four-phase decision framework, the per-component scoring criteria, the 90-vendor evaluation database, the architecture decision records, and the anonymized case study sit behind a client-materials gate, indexed alongside the public reference catalog so nothing gets lost. To get the scored matrix for your stack, book a scoping call.