Writing

Essays from a practitioner.

Essays where the analysis is more prescriptive than the research surface, or where the topic doesn't yet have enough evidence to anchor as a tracked hypothesis. The voice is the same; the framing is essayistic. Updated as the work warrants, not on a schedule. These essays are the reasoning behind the Capability Matrix scores, and the head-to-head benchmark evidence they cite lives in the Lab.

Organized by pillar. The ordering below reads top-to-bottom as the dependency stack: the foundations (lakehouse formats, catalogs) up through OCSF and Sigma standards, engines and pipelines that sit on top of them, the detection and migration practices that consume them, and finally the economics and vendor-watch layer that frames the whole.

Reading paths

If you read three, read these.

The full essay collection is a lot to land on cold. Each path is a three-essay arc through one question, cross-pillar, in order.

Leaving Splunk without breaking detections

The cost case, the migration trap most teams walk into, and what the timeline actually costs.

Whether you can trust your data

The quietest failures in security data — the parsing layer, and the measurement problem underneath it.

Picking the query engine

Where each engine wins, from petabyte-scale detection down to an analyst's laptop.

When hunting becomes data science

The path threat hunters are already on, made reproducible.

Jump to pillar

Lakehouse foundations (12) Catalogs (4) OCSF & schema (10) Sigma & detection portability (3) Query engines (6) Pipelines & streaming (12) Detection & hunting (17) Migration & federation (4) Economics & measurement (9) AI, automation & vendor watch (4)

Pillar · 12 essays

Lakehouse foundations.

Open table formats and the interop layer beneath them. Iceberg, Delta, V3/V4 features, and the Arrow standards that make engine portability real.

Pillar · 4 essays

Catalogs.

Polaris, Unity, Nessie. Governance reach, RBAC depth, meta-catalogs for asset context, and the lock-in surface where most lakehouse buyers underestimate the risk.

Pillar · 10 essays

OCSF & schema.

Normalization, mapping, and the anti-patterns from migrations gone sideways. Schema-on-read vs schema-on-write, OCSF reverse mapping, flattening detection logic.

Pillar · 3 essays

Sigma & detection portability.

Sigma 2.0 correlations, pySigma backend reality, and Sigma as the fourth foundational standard alongside Iceberg, Arrow, and OCSF.

Pillar · 6 essays

Query engines.

ClickHouse at petabyte scale, DuckDB for analyst hunting, materialized views, push vs pull, dbt as the SQL-transformation layer.

Pillar · 12 essays

Pipelines & streaming.

Cribl, Tenzir, Vector. Kafka, NATS, streaming-database decisions. Where pipeline lock-in moved after the SIEM lock-in eased.

Pillar · 17 essays

Detection & hunting.

Detection-engineering maturity ladder, MLOps for hunters, latency tiers, feature stores, PEAK methodology on a modern data stack.

Pillar · 4 essays

Migration & federation.

Migrating 800 detection rules across seven parallel ingest buses. Hidden costs. The federated rollout playbook. Splunk Federated Search as bridge or lock-in extension.

Pillar · 9 essays

Economics & measurement.

The cost optimization paradox in security data, the storage-media economics under the bill, and the cloud-versus-on-prem case for security telemetry. Why vendor benchmarks are the only benchmarks, and what to do about it.

Pillar · 4 essays

AI, automation & vendor watch.

Emerging analysis, tracked to read direction rather than claimed as core thesis: the NANDA agent-identity question, RAPTOR and the duct-tape era of agentic security, MCP beyond chat, and vendor watch on Databricks Lakewatch and the SDPP cohort. The security-data measurements are the anchor; the broad-AI framing is here to map where the field is heading.

Notifications

Get a note when a new essay or benchmark publishes.

Low-volume. Essays as they ship; quarterly benchmark reports; nothing else. No drip campaigns.

The hypothesis-grounded work — ten anchor hypotheses with evidence tiers, twenty-two contradictions tracked over time, and the method-in-practice essay — lives on the research page. The program POV that connects them is on thesis.

Essays from a practitioner.

If you read three, read these.

Leaving Splunk without breaking detections

Whether you can trust your data

Picking the query engine

When hunting becomes data science

Lakehouse foundations.

Iceberg V3 changed the security lakehouse thesis.

The encoder is the read lever, not the table format.

Same codec, different sizes.

The write pattern is the architectural decision.

Iceberg vs Delta Lake for security data.

V4 relative paths vs DuckLake's database-metadata.

Iceberg table maintenance at scale.

Deletion vectors and GDPR.

Variant type ends the flattening wars.

Row lineage as the missing CDC primitive.

Arrow and ADBC: a foundational pillar.

Arrow Flight and Flight SQL.

Catalogs.

The catalog became the control plane.

Unity Catalog vs Polaris vs Nessie.

Catalog governance without native support.

Meta-catalogs and asset context in federated environments.

OCSF & schema.

Schema-on-read vs schema-on-write.

LLM-assisted OCSF mapping.

OCSF ontological grounding: D3FEND for federal-ready.

OCSF reverse mapping.

OCSF and operational technology.

The field-mapping anti-pattern.

Flattening away your detection logic.

Context collapse, measured on real attack data.

Six schemas into OCSF: the mapping is the hard part.

From field mappings to the controls layer.

Sigma & detection portability.

Why Sigma won the detection-sharing decade.

Sigma and detection portability.

Sigma 2.0 correlations and the pySigma backend reality.

Query engines.

One engine in front: StarRocks over shared Iceberg.

ClickHouse at petabyte scale.

DuckDB for analyst-driven threat hunting.

Materialized views for security data.

Push vs pull query engines.

dbt for security data.

Pipelines & streaming.

Cribl vs Tenzir vs alternatives.

The pipe layer: what's missing from your AI security platform.

Vector: the data router Datadog open-sourced.

Pipeline lock-in.

The parsing layer nobody owns.

Pipeline-based detection in stream processing.

Observability pipelines and the security overlap.

ETL vs ELT for security data.

Kafka architecture deep-dive.

Kafka to Iceberg: the integration hidden costs.

The streaming database decision.

NATS JetStream: lightweight Kafka alternative, disqualified.

Detection & hunting.

The ground you're already standing on.

The assurance gap no single tool closes.

Detecting the OT you can't parse.

What your data means vs what shape it is.

Catching the mistake that kills a detection.

The query engine returned the wrong answer and didn't tell you.

The better the model, the quieter the wrong answer.

Parquet doesn't hash the way your security tools assume.

Who actually does the hunting.

The tools you can use today.

The detection engineering maturity ladder.

PEAK and the lakehouse.

Three latency tiers: detection, hunting, analysis.

MLOps tools for threat hunters.

Jupyter to MLflow for reproducible threat hunting.

Where detection-as-code notebooks should live.

Feature stores for security.

Migration & federation.

Migration: hidden costs and timeline reality.

Migrating 800 detection rules across seven parallel ingest buses.