Security Data Works

Writing

Essays from a practitioner.

Essays where the analysis is more prescriptive than the research surface, or where the topic doesn't yet have enough evidence to anchor as a tracked hypothesis. The voice is the same; the framing is essayistic. Updated as the work warrants, not on a schedule. These essays are the reasoning behind the Capability Matrix scores, and the head-to-head benchmark evidence they cite lives in the Lab.

Organized by pillar. The ordering below reads top-to-bottom as the dependency stack: the foundations (lakehouse formats, catalogs) up through OCSF and Sigma standards, engines and pipelines that sit on top of them, the detection and migration practices that consume them, and finally the economics and vendor-watch layer that frames the whole.

Reading paths

If you read three, read these.

The full essay collection is a lot to land on cold. Each path is a three-essay arc through one question, cross-pillar, in order.

Leaving Splunk without breaking detections

The cost case, the migration trap most teams walk into, and what the timeline actually costs.

  1. 1 The cost math: schema-on-read vs schema-on-write
  2. 2 The field-mapping anti-pattern
  3. 3 Hidden costs and timeline reality

Whether you can trust your data

The quietest failures in security data — the parsing layer, and the measurement problem underneath it.

  1. 1 The parsing layer nobody owns
  2. 2 Flattening away your detection logic
  3. 3 Why vendor benchmarks are the only benchmarks

Picking the query engine

Where each engine wins, from petabyte-scale detection down to an analyst's laptop.

  1. 1 ClickHouse at petabyte scale
  2. 2 DuckDB for analyst-driven hunting
  3. 3 Push vs pull query engines

When hunting becomes data science

The path threat hunters are already on, made reproducible.

  1. 1 PEAK and the lakehouse
  2. 2 MLOps tools for threat hunters
  3. 3 Jupyter to MLflow for reproducible hunting

Jump to pillar

Pillar · 12 essays

Lakehouse foundations.

Open table formats and the interop layer beneath them. Iceberg, Delta, V3/V4 features, and the Arrow standards that make engine portability real.

Pillar · 4 essays

Catalogs.

Polaris, Unity, Nessie. Governance reach, RBAC depth, meta-catalogs for asset context, and the lock-in surface where most lakehouse buyers underestimate the risk.

Pillar · 10 essays

OCSF & schema.

Normalization, mapping, and the anti-patterns from migrations gone sideways. Schema-on-read vs schema-on-write, OCSF reverse mapping, flattening detection logic.

Pillar · 3 essays

Sigma & detection portability.

Sigma 2.0 correlations, pySigma backend reality, and Sigma as the fourth foundational standard alongside Iceberg, Arrow, and OCSF.

Pillar · 6 essays

Query engines.

ClickHouse at petabyte scale, DuckDB for analyst hunting, materialized views, push vs pull, dbt as the SQL-transformation layer.

Pillar · 12 essays

Pipelines & streaming.

Cribl, Tenzir, Vector. Kafka, NATS, streaming-database decisions. Where pipeline lock-in moved after the SIEM lock-in eased.

Pillar · 17 essays

Detection & hunting.

Detection-engineering maturity ladder, MLOps for hunters, latency tiers, feature stores, PEAK methodology on a modern data stack.

Pillar · 4 essays

Migration & federation.

Migrating 800 detection rules across seven parallel ingest buses. Hidden costs. The federated rollout playbook. Splunk Federated Search as bridge or lock-in extension.

Pillar · 9 essays

Economics & measurement.

The cost optimization paradox in security data, the storage-media economics under the bill, and the cloud-versus-on-prem case for security telemetry. Why vendor benchmarks are the only benchmarks, and what to do about it.

Pillar · 4 essays

AI, automation & vendor watch.

Emerging analysis, tracked to read direction rather than claimed as core thesis: the NANDA agent-identity question, RAPTOR and the duct-tape era of agentic security, MCP beyond chat, and vendor watch on Databricks Lakewatch and the SDPP cohort. The security-data measurements are the anchor; the broad-AI framing is here to map where the field is heading.

Notifications

Get a note when a new essay or benchmark publishes.

Low-volume. Essays as they ship; quarterly benchmark reports; nothing else. No drip campaigns.

The hypothesis-grounded work — ten anchor hypotheses with evidence tiers, twenty-two contradictions tracked over time, and the method-in-practice essay — lives on the research page. The program POV that connects them is on thesis.