Service Offering 3 · Detection strategy

Where detection runs, and how fast it has to be.

The design track makes one decision the rest of the architecture hangs on: do you detect by querying stored data after the fact, by detecting inside the streaming pipeline before the data lands, or by splitting the two across your sources. I don't assume the answer. I classify the SOC profile, recommend query-based, pipeline-based, or hybrid, and then assign each workload to the latency tier it actually needs. After 2026 the detection tier no longer gets to be slow.

Two detection models

Query-based stores everything and asks later. Pipeline-based decides first.

There are two coherent ways to architect detection, and they make opposite trade-offs. Most platforms end up with one by accident, inheriting whatever the incumbent SIEM imposed, rather than by choosing. The design engagement makes the choice explicit, because it is the decision that determines storage cost, detection latency, and how much of your history you can interrogate later.

Query-based detection (the traditional model)

Ingest all data, query it for threats, and store everything. The advantage is retroactive analysis: because every event is retained, you can write a detection tomorrow and run it against data you collected last quarter. When a new technique is disclosed, you can hunt backwards for it. The cost is high storage, because you are paying to retain everything on the chance that any of it becomes relevant. The latency is minutes to hours, because detection is a batch query running on a schedule against landed data. This is the model the scheduled-query SIEM cadence was built around, and it is still the right model when investigation and retroactive reach are the point.

Pipeline-based detection (the modern model)

Detect inside the pipeline as the data flows, then land the signals plus a sampled slice of the raw. The advantage is real-time detection and a 10–50× cost reduction, because you are no longer paying full-fidelity storage for every high-volume source. You keep the detections and a sample rather than the firehose. The latency is seconds, because detection happens in a streaming engine rather than a scheduled query. The honest trade-off is that the detection logic has to be defined upfront. You detect what you decided to look for at design time, and you lose retroactive flexibility on the sources you only sampled. If a technique is disclosed next year and the relevant raw was sampled away, you cannot hunt backwards through data you did not keep. That is a real cost, and the engagement names it source by source rather than waving it off.

This is the natural pairing point with detection-as-code: pipeline-based detection only stays maintainable when the detection content is portable, version-controlled, and testable rather than hand-built inside one engine. That argument lives in /thesis/detectflow, and it is the natural follow-on once the detection model is settled.

Hybrid is usually the answer

High-value sources stored in full. High-volume sources sampled.

In practice the answer is rarely all-query or all-pipeline. The hybrid pattern stores the high-value sources in full (the ones where retroactive reach matters and where the volume is tolerable) and samples the high-volume sources through pipeline-based detection, keeping signals plus a representative slice rather than the whole firehose. The decision is made source by source, not platform-wide, which is why the engagement starts by classifying the SOC profile rather than recommending a model in the abstract. The profile determines which way each source leans.

Investigation-heavy SOC. Leans query-based. The work is reconstructing what happened across long timelines, so retroactive reach across full-fidelity history is worth the storage cost.
Regulatory-retention environment. Leans query-based. The retention requirement already forces you to keep the data; sampling it away would violate the obligation, so the storage is non-negotiable and you may as well make it queryable.
Detection-heavy SOC. Leans pipeline-based. The work is catching things fast and at volume, where seconds-latency streaming detection is the point and the upfront-logic trade-off is acceptable.
Cost-constrained program. Leans pipeline-based. The 10–50× storage reduction on high-volume sources is the deciding factor, and the engagement makes the retroactive-flexibility loss explicit so it becomes a chosen trade-off rather than a surprise.

Most real environments have all four characteristics in different sources, which is why the hybrid split is the common recommendation: regulatory sources stored in full, the noisy high-volume telemetry detected in the pipeline and sampled, and the investigation-critical sources kept queryable regardless of volume. The streaming and lake layers this split runs on are the MOAR components described in /thesis/moar. I won't reproduce that here; the design engagement places your sources into it.

The latency reality after 2026

The detection tier no longer gets to take five minutes.

The reference architecture was written in October 2025 assuming that a 5–15 minute scheduled-query cadence was acceptable for many detection use cases. By April 2026 that assumption no longer holds for the detection workload. Three observations forced the recalibration: CrowdStrike's reported 27-second fastest-recorded adversary breakout time, Mandiant's negative mean-time-to-exploit observation, and the Anthropic Claude Mythos Preview disclosure. Taken together, the window between initial access and lateral movement is now measured in tens of seconds, not minutes. A detection that fires fifteen minutes after the event fires after the adversary has already moved.

That does not mean every query has to be fast. It means the latency requirement is per-tier, and the engagement assigns each workload to the tier it actually needs:

Detection tier, seconds to sub-second. This is the workload the recalibration touches. It runs on streaming: Kafka for transport, with RisingWave, Flink, or Tenzir doing the in-pipeline detection. The 5–15 minute scheduled-query cadence is no longer defensible here.
Hunting tier, batch. Retroactive threat hunting across history. Runs on the lake: Iceberg as the table format, queried through ClickHouse for interactive speed and DuckDB for local and ad-hoc analysis. Minutes-to-hours latency is fine, because the work is exploratory rather than time-critical.
Analysis tier, batch. Reporting, compliance, and longer-horizon analytics. Same Iceberg-plus-ClickHouse-and-DuckDB foundation. The old scheduled cadence remains entirely appropriate here; nothing about the recalibration changes this tier.

The architectural consequence is concrete. Deciding the detection model is also deciding which sources have to traverse the streaming path. Any source carrying a detection that must fire within the adversary's breakout window has to flow through Kafka and a streaming detection engine; it cannot sit behind a scheduled query. Sources whose only consumers are hunting and analysis can land directly in Iceberg and be queried in batch. The design engagement produces that source-to-tier mapping explicitly, so the latency requirement is an engineered property of the architecture rather than an assumption that fails the first time it is tested under an incident.

The latency claims, the streaming-versus-batch split, and the cost figures behind the 10–50× reduction are the same numbers I publish and defend in the /lab benchmark. The engagement applies them to your sources; the lab is where the underlying measurements are open to scrutiny.

A detection model chosen against your SOC, not inherited from your SIEM.

The intro call classifies the SOC profile and sizes the design track. Query-based, pipeline-based, or hybrid, and which sources have to run at seconds-latency.

Book an intro call → Back to the design track