The Lab

Independent benchmarks. Code in the open.

The lab is my independent test environment for security data tools. Each benchmark ships with reproducible methodology, containerized environment, query suite, and analyzed results. The point isn't "we ran a benchmark" — the point is that the workload, the method, the result, and the code are all published, so anyone can re-run the experiment on their own hardware and verify the answer.

Published benchmark

ClickHouse runs the same security workload 145× faster than the dominant schema-on-read SIEM.

Identical workload (10M Zeek conn.log events), identical hardware, identical queries. Methodology and code published; reproducible on commodity hardware.

The headline numbers.

ClickHouse Native (MergeTree engine) completes a five-query analytical suite on 10 million Zeek conn.log events in 0.19 seconds. The schema-on-read SIEM completes the identical suite on the identical events in 27.52 seconds. The ratio is 145×. ZSTD-22 compression on the ClickHouse side reduces 3.27 GB of raw JSON to 399 MB on disk — an 8.2× compression factor. The schema-on-read SIEM's compressed footprint on the same data is roughly 2,385 MB, a 1.4× factor.

The scaling profile is informative. The schema-on-read SIEM runs 3.47 seconds at 1M events; 27.52 seconds at 10M events. That's an 8× increase in latency for a 10× increase in data — worse than linear scaling. ClickHouse stays sub-second across the same range. The performance gap widens as data volume grows, which is the opposite of the direction per-GB-ingested licensing models need it to scale.

The workload.

Zeek conn.log is the per-connection network telemetry record produced by the Zeek (formerly Bro) network security monitor — one of the most common high-volume security data formats in production SOCs. The benchmark loads 10 million conn.log events with realistic field distributions, spread across the standard analytical workload shape: time-bucketed aggregation, top-talker queries, protocol analysis, distinct-host counts, and a cross-source JOIN with simulated SIEM alerts.

Five queries, ten iterations each for statistical stability. Hardware: single-node Docker Compose on WSL2, 32 GB RAM, 16 cores. Both engines configured identically — same row counts, same memory limits, same query suite, same iteration count. No per-tool tuning was applied beyond the documentation defaults for either side.

Reproducibility.

The benchmark repository contains: Docker Compose definitions for ClickHouse, the schema-on-read SIEM, and the additional engines tested (Trino, Dremio, StarRocks); data generation scripts that reproduce the Zeek conn.log distribution; the Python query runner; the methodology document; the analysis JSON.

Reproducibility isn't a marketing line. The repository is shared under NDA with engagement prospects and qualifying reviewers; running it on your hardware should land within statistical variance of the published numbers. If it doesn't, the discrepancy is a contribution to my understanding and gets folded into a result update with the new evidence. The reference implementation isn't published publicly because the comparison set includes commercial software whose licensing terms restrict third-party publication of comparative test results — a constraint I respect rather than work around.

Download

The methodology PDF.

Hardware spec, workload definition, query suite, scaling profile, statistical-confidence detail, and the documented caveats for where the result generalizes and where it doesn't. v1.0, 2026-05. Roughly ~5 pages, no email gate.

Download methodology (PDF) →

For NDA-gated reference-implementation access (Docker Compose, data generators, query runner) — book a discovery call or email jeremy@securitydataworks.com with subject Benchmark NDA request.

How the lab runs benchmarks

Four principles. Documented before any tool runs.

Reproducibility before performance.

A benchmark result that can't be re-run isn't a benchmark; it's an opinion. Every published result ships with the methodology document, the containerized environment definition, the data generators, the query suite, and the analyzed output. A practitioner with the same hardware and same data can re-run the experiment and verify the number independently. The reference implementation is shared under NDA with engagement prospects and qualifying reviewers — not published openly, because the comparison set includes commercial software whose licensing terms restrict third-party publication of comparative benchmark results. The methodology, the result, and the reasoning are public; the executable artifact is gated by a one-page NDA.

Identical workload across candidates.

Workload and queries are defined and pinned before any tool is run. No per-tool tuning advantage; the same query suite runs against the same data on every candidate engine. Vendor-recommended configurations are tested as additional rows in the result table — labeled clearly as vendor-recommended — rather than folded silently into the headline. The point is to characterize each tool's behavior on the workload, not to engineer the most flattering possible result for any one of them.

Documented caveats.

Every result ships with what was tested, what wasn't, and which workloads the result generalizes to. The benchmark is single-node; production environments typically run multi-node, and the relative performance shifts on multi-node JOINs in ways the single-node benchmark doesn't capture. The benchmark covers analytical aggregation queries; full-text-search-dominated workloads aren't in the query suite, and the result generalizes less cleanly to those. The benchmark uses one log type (Zeek conn.log); other log shapes (endpoint telemetry, cloud control plane, identity events) carry their own performance characteristics. The caveats are the part that lets a reader know whether the number applies to their environment.

Vendor cooperation invited, not required.

Every vendor whose product appears in a benchmark is invited to review the methodology and propose configuration changes before publication. Vendor-proposed configurations are tested and reported as additional result rows, labeled clearly. The lab doesn't accept funded benchmarks, doesn't allow pre-publication vetoes, and doesn't allow vendors to dictate workload selection — but the methodology review is open, and that openness is part of why the published numbers survive contact with the vendors after release.

External review on annual cadence.

Once a year, an outside practitioner with the relevant standing — security data engineer, OCSF contributor, or analyst with quantitative-benchmark expertise — audits the lab's published results under NDA. They get the same access an engagement prospect gets: full methodology, full reference implementation, the underlying analysis JSON. Their signoff is published on the lab page; their flagged issues drive corrections to the public results. The reviewer is named on this page, on a one-year rotation.

What we changed our mind on

The benchmark headline doesn't carry the cost story alone.

For most of 2025, the operational reading of the benchmark was straightforward: ClickHouse is 145× faster and meaningfully cheaper than the schema-on-read SIEM on equivalent workloads. The first half of that statement is robust. The second half — the cost framing — is more nuanced than the early write-ups admitted, and the page where it gets revised is the lab page itself, not a footnote elsewhere.

ClickHouse is cheap versus per-GB-ingested licensing models. 30–90% cost reduction is documented across multiple production deployments — Huntress, Uptycs, Hunters, Panther — and reproduces in TCO modeling. ClickHouse is comparable to Snowflake or Databricks SQL at sustained TB/day workloads, where the managed pricing converges. ClickHouse is structurally more expensive than Iceberg-on-S3 with a separate query engine — the MergeTree format duplicates data already storable in open formats on S3, the compute and storage scale together rather than independently, and the replication overhead multiplies storage cost.

The updated framing the lab uses today: the benchmark validates the performance claim cleanly. The cost claim only carries when the comparison baseline is named — versus the legacy schema-on-read licensing model specifically, not in the abstract. Production deployments that lean ClickHouse-first on hot tier and Iceberg-on-S3 on cold tier are an emerging pattern that captures both the latency advantage and the open-format cost economics. The lab's planned Q4 work on streaming write maturity into Iceberg is partly aimed at characterizing this hybrid shape.

Roadmap

One benchmark per quarter. The next one is announced a quarter ahead.

Q3 2026 — catalog comparison.

Polaris versus Nessie versus Unity Catalog versus AWS Glue versus Hive Metastore. Five catalogs against a uniform workload that exercises governance (RLS, column masking), multi-engine query support (Iceberg via Trino, Dremio, ClickHouse, Athena), schema evolution semantics, and operational behavior under realistic security-data shapes. The output: a head-to-head matrix with the catalog choice mapped to deployment archetypes (isolated dedicated, shared corporate, multi-tenant MSSP). This is the benchmark that backs the catalog scoring on the matrix.

What the comparison is not: a feature-checkbox tour. Catalogs differ in feature surface area, but the load-bearing differences in production are governance enforcement under cross-engine query, RBAC scope semantics under cross-team access, and the operational tax of running each at scale. Those are the dimensions the benchmark exercises.

Q4 2026 — candidate slate.

Five candidates are on the shortlist; one will be selected and announced one quarter ahead. The OLAP engine bake-off — ClickHouse, Dremio, StarRocks, Trino on an identical security-data workload — extends the existing benchmark across more engines on the same workload. Kafka-to-Iceberg latency characterization measures how fresh streaming writes into Iceberg actually are in practice, across Iceberg streaming writes, Tabular's commercial offering, RisingWave, and Flink CDC; this is the benchmark that resolves part of the H-IMPL-01 streaming-cost caveat on the research page. A federated-query stack comparison covers Trino versus Dremio versus the dominant SIEM's federated-search mode on cross-source JOINs at production scale.

An OCSF normalization at-source comparison covers Tenzir versus Vector versus Cribl on identical multi-source workloads — particularly relevant to the AI-generated parser claims tracked as a writing essay. A storage-tier cost-versus-latency curve characterizes S3 Standard versus Infrequent Access versus Glacier versus on-premises MinIO across retention archetypes. Selection closes at the end of Q3 with the published Q4 announcement.

Cadence and access

Quarterly. Public methodology, paid synthesis.

One tool-eval report per quarter. The topic is announced one quarter ahead so vendors can review the methodology and prospective clients can request scope adjustments before the workload is pinned. The benchmark itself ships in the quarter it's announced for: Q3 catalog comparison runs across July, August, and September with the report landing late September.

The benchmark methodology, code, and headline numbers are published openly to GitHub. The synthesized report — the recommendation per environment archetype, the workload-shape sensitivity analysis, the TCO modeling layered on top of the raw numbers — is included in paid engagements above $25K. The split is intentional: the benchmark is a public good; the synthesis is the engagement deliverable.

The lab also accepts ad-hoc benchmark proposals from prospective clients during engagement scoping. If the workload archetype on the table doesn't have a published benchmark answering the load-bearing question, the engagement can include a workload-specific run. The methodology rigor is the same as the published benchmarks; the result publishes on the lab page after the engagement concludes.

The benchmark is the receipt. The matrix is the decision.

The research page connects the benchmarks back to anchor hypotheses. The matrix offering applies them to your workload.

See the research → See engagements