The Capability Matrix · v1

Same candidates. Different workload. Different winner.

Most vendor evaluations produce a single ordering and call it done. That ordering is wrong roughly half the time, not because the analysis is bad, but because the workload it implicitly assumed isn't your workload. The MOAR Capability Matrix is a discipline for scoring lakehouse engines, table formats, catalogs, and observability pipelines under explicit workload archetypes, with per-archetype weights that produce different orderings on purpose. v1.2 covers three components × three archetypes (nine scoring runs total, twenty-five Metric Decision Records) with the AWS-native pieces (Athena, Kinesis Firehose + Lambda) explicitly scored alongside the third-party candidates.

What the matrix is

A scoring discipline, not a leaderboard.

v1 scores four candidate sets across three components of the modern security-data stack:

OLAP engines. Five candidates: ClickHouse, Uncle Rico, Trino, StarRocks, DuckDB. Nine scoring criteria from raw query performance through cost-to-serve (compute-$/effective-TB-scanned) to in-house skill base.
Table formats + catalogs (bundled). Four paired choices: Iceberg+Polaris, Delta+Unity, Iceberg+Nessie, Iceberg+AWS Glue. Ten criteria spanning multi-engine portability, RBAC depth, schema evolution, cost-to-serve, and cloud-native posture.
Observability pipelines. Four candidates: Tenzir, Cribl, Vector, Kafka Connect. Nine criteria including default reduction ratio, OCSF fidelity, lock-in, and cost-to-serve net of reduction.

Each component is scored separately under each archetype, and the archetype is what makes the orderings comparable. Without an explicit archetype, every reader brings their own implicit one, and the conversation breaks down before it starts.

v1.2 publishes three archetypes for each component — multi-engine open / single-vendor managed / AWS-native lake for Formats+Catalogs, with parallel Zeek-heavy-SOC / multi-source-federated / AWS-native framings for Engines and Pipelines. Nine scoring runs total. Engagements that don't fit the published archetypes get a fourth archetype defined during scope-of-work.

Five validation patterns

The matrix surfaces home archetypes; it doesn't engineer for them.

Across nine scoring runs, five distinct patterns emerged, and each is the discipline working. They differ mechanically, but each one is a test that the methodology is producing honest results rather than predetermined ones.

1. Dramatic re-ranking.

In Engines, ClickHouse ranks #1 under Archetype A (Zeek-heavy SOC, p99 < 5s on recurring queries) and #3 under Archetype B (multi-source federated, ten-plus disparate sources requiring true federation), then slips to #4 under Archetype C (AWS-native) once the serverless and native-IP-type weights rise; other engines move the opposite way as the federation and concurrency weights rise, so Trino sits at #2 under both Archetype A and Archetype B before settling to #3 at C. The candidates and the criterion taxonomy stay the same while the weights change, so the orderings shift substantially because the criteria those candidates score best on carry different weights under different workloads.

2. Modest re-ranking with absolute lift.

In Formats+Catalogs between Archetypes A and B, Delta+Unity ranks #2 at A and ties for #1 at B, while the absolute score lifts +0.85 points. The ordering doesn't flip dramatically (Polaris was already strong, and stays strong), but the magnitude shift is honest and visible. The customer call at B becomes "broad portability or RBAC concentration?" Both are valid; the matrix shows the tradeoff rather than hiding it under a single number.

3. Position-preserving, total-preserving, lead-widening.

In Pipelines, Tenzir wins both archetypes (total roughly unchanged from A to B), but the lead over Cribl widens because Cribl drops harder on OCSF fidelity than Tenzir does on default reduction. This is the subtlest pattern, and it makes one of the stronger arguments for archetype-conditional weights, because the matrix preserves the truth that Tenzir is the better OCSF tool while honestly representing that absolute defensibility doesn't suddenly leap just because the workload shape matches. The production-evidence asymmetry between Cribl and Tenzir is a procurement consideration that matters, so it gets called out in the engagement notes rather than buried in the weighted total.

4. Monotonic home-archetype confirmation.

This one surfaced once the third archetype landed for Formats+Catalogs, where Iceberg+AWS Glue ranks #4 under Archetype A (2.70 weighted), lifts to #4 under Archetype B (3.18), and ranks #1 under Archetype C (4.20): a monotonic A→B→C lift of +1.50, the largest single-candidate archetype-shift in v1. The other candidates show the mirror image. Polaris peaks at A, ties at B, runner-up at C; Delta+Unity peaks at B, drops back at C; Nessie has no home archetype because its differentiator (branching for detection-as-code) doesn't anchor any archetype's design center.

5. Category-winner change across archetypes.

This is the cleanest pattern in the matrix, and it makes the strongest argument for archetype-conditional weights, because in Pipelines, Tenzir wins both Archetype A and Archetype B while Vector wins Archetype C. This isn't a re-ranking magnitude or a margin shift, it's a different tool sitting at #1, and the mechanical driver is that Archetype C's twin design-center weights (pricing 20 + lock-in 15) elevate Vector's OSS economics over Tenzir's OCSF fidelity. When the workload framing genuinely changes the right answer, the matrix returns the new answer rather than hiding it under marginal-total adjustments, and the v1.2 Kinesis Firehose+Lambda addition refines this with a top-3 spread of just 0.20 points across three genuinely different procurement postures (OSS-on-EKS, commercial-OCSF-on-EKS, fully-AWS-native).

Three archetypes is enough to see whether every candidate has a home, and whether the matrix correctly identifies it. v1.2 says yes for Formats+Catalogs and Engines. For Pipelines, the matrix instead returns a cleaner finding: at Archetype C, no candidate dominates, and the customer's lock-in posture is the pivot between three roughly-equivalent tools.

Per-archetype orderings

What ranks where, and why your archetype matters.

Orderings only. Per-criterion scores, weighted totals, and the full Metric Decision Records ride paid IP per the public-vs-paid line further down. The orderings here are the headline finding, useful to anchor a conversation, not sufficient to ground an architecture decision on their own. Asterisks mark orderings that currently rest on Tier-B or Tier-C evidence; the trigger for Tier-A upgrade is named in the cadence section.

The cost-to-serve lens

Cost-to-serve at retention, the modeled all-in cost to land, store, and query one effective GB over the retention you actually keep rather than the list rate, is the dimension most buyers decide on, so it is weighted explicitly in every component and it is part of what drives the orderings below. It reads in effective terms because compression, tiering, and ingest reduction pull real cost roughly an order of magnitude away from sticker price, and the gap between a vendor's claimed cost and its measured effective cost is itself part of the score.

Where it moves the ranking: under the Engines Zeek-heavy-SOC archetype, ClickHouse's effective compute-cost-per-TB-scanned, anchored by the lab's measured advantage on a 10M-event Zeek workload — ClickHouse runs the hunting-shaped aggregations 21–62× faster than a schema-on-read SIEM (46.8× on the five-query average; the index actually wins the simple lookups), answer-equality verified, single-node Tier B — and 8.2× compression over the same foil, reinforces its #1, because there it is both the fastest and the cheapest to run. Under the Pipelines cost-reduction-led archetype, Tenzir's default reduction ratio is the cost lever that holds its lead, since cost-to-serve is scored net of reduction. And at the AWS-native and lock-in-sensitive archetypes, cost-to-serve is what tilts the ordering toward native-service pricing and OSS economics, lifting Vector at Pipelines C and Iceberg+AWS Glue at Formats C. The effective-cost numbers stay paid; the ordering effect is what's public. A first-party run on the reference stack in June 2026 corroborates the storage side of that cost directly: over a shared OCSF corpus the columnar lakehouse returned the same answers as an OpenSearch schema-on-read foil while occupying one-seventh the footprint, the kind of measured ratio the cost-to-serve weight is built on.

OLAP engines

The per-archetype engine ranking shifts substantially with the workload weights: the candidate that leads a Zeek-heavy SOC is not the one that leads a federated lakehouse or a serverless-within-AWS estate. ClickHouse leads Archetype A on recurring-query latency, and across the three archetypes StarRocks, Trino, and DuckDB change order as the weights change. The full scored ranking — including engines evaluated under vendor benchmark-publication terms — is engagement / NDA content rather than a public leaderboard, per the public/paid line.

* Archetypes B and C rest on practitioner accounts + vendor production references for federation performance, concurrency, and AWS-native ops integration depth. The single-host engine-join-specialization bench ran 2026-06-10 (Tier B) and anchors raw-perf and join specialization on single-host evidence, and the cluster regime — federated-join scenarios and Athena head-to-head against the leading OLAP engines on identical Iceberg-on-S3, plus distributed multi-node — is the open Tier-A gap. Ahead of that work, the reference stack already verifies that the candidate engines return identical answers on one shared OCSF/Iceberg table, so the cross-engine comparison starts from equal answers rather than answers that only look equal, and a single-host latency snapshot over the same table bears out the premise behind these orderings, that the per-workload winner changes and engine specialization is a property of scale and concurrency, which a concurrency sweep bears out directly: the server engines turn added clients into throughput while the embedded engine's per-query tail latency degrades. v1.2 promoted Athena from implicit reference to explicit Engines-C candidate, and with the v1.3 native-IP-type criterion added it now leads Archetype C at the top of the engine ordering, edging past the next candidate inside the per-criterion noise floor.

Table formats + catalogs (bundled)

Rank	A — multi-engine open	B — single-vendor managed*	C — AWS-native lake*
1	Iceberg + Polaris	Polaris / Delta+Unity (tie)	Iceberg + AWS Glue
2	Delta + Unity	— (tied)	Iceberg + Polaris
3	Iceberg + Nessie	Iceberg + Nessie	Delta + Unity
4	Iceberg + AWS Glue	Iceberg + AWS Glue	Iceberg + Nessie

* Archetypes B and C currently rest on AWS Security Lake production references (Tier A on availability) plus practitioner accounts on RBAC behavior at scale (Tier B). The Q3 2026 catalog comparison benchmark (Polaris, Nessie, Unity, Glue on identical workload with catalog-enforced RBAC scenarios) anchors RBAC and ecosystem maturity at Tier A and may compress or widen the gaps.

Observability pipelines

Rank	A — cost-reduction-led	B — OCSF normalization*	C — AWS-native ingest*
1	Tenzir	Tenzir	Vector
2	Cribl	Cribl	Tenzir
3	Vector	Vector	Kinesis Firehose+Lambda
4	Kafka Connect	Kafka Connect	Cribl
5	—	—	Kafka Connect

* Tenzir's OCSF fidelity score is currently Tier B/C (production references exist but no published head-to-head audit at petabyte scale). Cribl's production-evidence base is Tier A/B (50+ Fortune-100 references). The lab's head-to-head pipeline benchmark with OCSF lossiness measurement, targeted for ~Q1 2027, closes that asymmetry. v1.2 promoted Kinesis Firehose+Lambda from implicit reference to explicit Pipelines-C candidate. The top-3 spread at Archetype C is just 0.20 points across three genuinely different procurement postures, with the customer's lock-in posture as the pivot rather than a single winner.

The orderings on this page are the visible part of the matrix. The Metric Decision Records, per-criterion scores, assumptions registry, cross-component coupling notes, and evidence-completeness audit all live behind the engagement.

What it's for

Architecture decisions that survive contact with deployment.

The matrix exists to make architecture conversations concrete. Most engagements start with a vendor question ("should we replace our SIEM with ClickHouse?") and a vague workload picture. The matrix forces both sides of the question to get specific:

Which archetype does the engagement actually map to? If the published archetypes don't fit, scope-of-work defines a fourth, and that definition becomes a deliverable. The archetype itself is part of the recommendation.
What's the cross-component coupling? Engine choice constrains catalog viability. Catalog choice influences pipeline design. The matrix's component tables don't read independently. They read as a joint recommendation.
Where is the position fragile? Every scored candidate carries a "where the ranking is fragile" section: which score is one benchmark away from shifting, which assumption is doing the heavy lifting, which vendor announcement would force a refresh.
What does the matrix not yet measure? v1 covers four candidate sets across three components. Emerging candidates (Hudi, Onum, Observo, Fluent Bit, S3 Tables managed Polaris) are tracked but not yet scored. The "I don't know yet" answer is part of the deliverable.

The output of a matrix engagement is an architecture position with explicit caveats, refresh triggers, and a revalidation calendar, which is a different thing from a glossy "best practices" deck that goes stale the moment a vendor ships a release.

Public, paid, and the line between

Methodology is public. Per-criterion scores are paid.

The discipline is what makes the matrix portable; the scoring is what makes a specific engagement actionable. The public surface and the paid surface map to that distinction.

Public.

The methodology: criterion taxonomies, archetype definitions, scoring rules.
The five validation patterns above.
Per-archetype orderings (the rank tables on this page).
Disclosure flags on every candidate where a relationship could shape interpretation.
Refresh triggers (what benchmark or vendor announcement would force a re-score).

Paid.

Per-criterion scores for every candidate × archetype combination, with evidence tier and citation.
Weighted totals with lower/upper bounds when null values propagate.
The full Metric Decision Records (currently MDR-0001 through MDR-0025) with provenance (public-data, assumption, benchmark, methodology) for every scoring decision.
The assumptions registry with refutation criteria.
Cross-component coupling worksheets for engagement-specific stack proposals.
The internal scoring runs and qualitative notes that show where the matrix's recommendation is one benchmark away from flipping.

The 4-6 week scrub.

Per-archetype orderings move from paid to public after a 4-6 week delay, long enough that engagements completed in that window get the current ordering, yet short enough that the public surface doesn't drift meaningfully out of sync with the practice. The current public orderings on this page reflect the v1 internal scoring run dated 2026-05-25; the cost-to-serve criterion entered the published methodology on 2026-06-04, and its rank-level effects publish on the same scrub cadence, with its direction across the current orderings described in the cost-to-serve lens above. The next refresh will land after the Q3 2026 catalog comparison benchmark anchors the largest open Tier-A evidence gap.

Cadence and disclosures

A six-month default. Specific triggers shorten it.

Each scoring run carries a revalidate-by date six months out. Specific triggers shorten that:

Q3 2026 catalog comparison benchmark. Polaris, Nessie, Unity, Glue on identical workload with catalog-enforced RBAC scenarios. Anchors RBAC and ecosystem-maturity scores at Tier A for Formats+Catalogs across all three archetypes.
Cluster / concurrent OLAP engine regime — federated-join, AWS-native, and multi-node scenarios. The single-host slice ran 2026-06-10 (engine-join-specialization, Tier B), anchoring raw-perf and join specialization on single-host evidence; the cluster, federated-join, and AWS-native scenarios remain the open gap that anchors federation and at-scale concurrency scores at Tier A for Engines across both archetypes.
~Q1 2027 head-to-head pipeline benchmark with OCSF lossiness measurement. Anchors OCSF fidelity and reduction-ratio scores at Tier A for Pipelines across both archetypes.
Iceberg V4 ships. Refreshes schema-evolution scoring across all Iceberg-based candidates.
Disclosure escalation. Any commercial relationship that develops with a scored vendor triggers a re-disclosure pass before the next public-surface refresh.

Current disclosures: Delta+Unity carries a nature: explore flag. There's an in-progress conversation with Databricks about Lakewatch GTM that is not yet a commercial relationship. Lakewatch launched March 24, 2026 in Private Preview as a Claude-powered agentic SIEM on Delta+Unity (Adobe and Dropbox among the named early customers; consumption pricing). The launch is acknowledged here because Databricks has moved from being a data-platform vendor to being a direct SIEM entrant, which sharpens the relevance of the disclosure even though the score itself holds (private-preview status doesn't change the catalog-and-format ordering). It's flagged here for transparency, and it has no impact on scoring under any archetype.

The Splunk product family is genericized across all matrix surfaces (as "schema-on-read SIEM" or a generic SIEM descriptor) per Splunk's EULA Section 1.2(v) / 3(f). Public comparisons reference the category, not the vendor.

The orderings are a starting point. The engagement is where the matrix earns its keep.

A matrix engagement defines your archetype, scores the candidates under it, names the cross-component coupling, and hands back an architecture position with explicit refresh triggers. The deliverable is the position plus the calendar, not a deck that goes stale on contract signature.

See the engagement model → Read the underlying research method