Architecture foundations

Three latency tiers: detection, hunting, analysis.

Most security architecture mistakes I've watched happen (including some of my own) sit on the boundary between two workload classes that look similar from a distance and behave nothing alike up close. The cleanest way I've found to talk about that boundary is a three-tier latency model: detection, hunting, analysis. It's the frame underneath the DetectFlow thesis, and it became sharper, not looser, after Anthropic's April 2026 Claude Mythos Preview disclosure.

Reading time: about 15 minutes. Evidence tier: A for the three benchmark datapoints (CrowdStrike 2026 Global Threat Report, Mandiant M-Trends 2026, Anthropic Claude Mythos Preview with the AISLE academic rebuttal). The three-tier taxonomy itself is Tier C, a framing I find useful, not a published standard. Practitioner implications of Mythos are still developing; I hedge accordingly.

The short version

Detection latency stopped being a tunable variable.

Through most of 2024 and 2025, I treated detection latency as one of several variables in the cost-benefit tradeoff between a vendor SIEM and a security lakehouse. Five-to-fifteen-minute scheduled-query detection was slower than ideal, but defensible. Most enterprises were running something in that range already, and the rest of the lakehouse story (cost, schema portability, analyst-engine flexibility) was strong enough to carry the architecture.

That framing was defensible in late 2025. It may no longer be defensible now. Three benchmark datapoints landed between October 2025 and April 2026, and together they describe a threat model where five-to-fifteen-minute detection has been overtaken by the speed of the attack:

CrowdStrike's 2026 Global Threat Report reported a 27-second fastest recorded adversary breakout time (the interval from initial compromise to lateral movement, the fastest observed across their 2025 intrusions). A detection system that fires in five-to-fifteen minutes has missed the response window by more than ten times.
Mandiant M-Trends 2026 documented a negative mean time-to-exploit: across the year's observed exploitations, mass exploitation now begins, on average, seven days before the patch is publicly available. The defender's time budget stopped being "patch within X days" and became "detect and contain within seconds to minutes, because the patch isn't going to save you."
Anthropic's Claude Mythos Preview (disclosed April 2026) described an unreleased frontier model that autonomously discovered thousands of zero-day vulnerabilities across major operating systems and browsers, with a working exploit chain for FreeBSD NFS remote code execution and a 72.4% full-code-execution rate on Firefox's SpiderMonkey engine (181 of 250 trials), though against bugs Mozilla had already patched in Firefox 148, so the headline collapses to roughly 4.4% once the two dominant patched bugs are removed. The follow-up AISLE academic rebuttal showed that eight small open-weight models all detected Mythos's flagship FreeBSD exploit in zero-shot API calls, and a model with 5.1 billion active parameters (the GPT-OSS-120b mixture-of-experts) recovered the core chain of the OpenBSD bug.

You can argue with any one of these datapoints in isolation. I think arguing with all three at once is harder. Together they describe a threat model where offensive operations run at machine speed, where exploitation precedes patch availability, and where the capability gap between nation-state actors and skilled hobbyists has compressed. The only defensive response that maps onto that threat model is detection that runs on the same timescale as the attack: seconds, not minutes.

What I want to be careful about: Mythos was disclosed in April 2026. The practitioner implications are still being worked out across the industry, and the people writing post-mortems on real Mythos-class incidents haven't published them yet. The breakout-time and time-to-exploit numbers are empirical and stand on their own; the Mythos-class capability data is real but its operational consequences are still developing. I treat the recalibration as a structural argument rather than a confident prediction.

The model

Three tiers, distinguished by who is waiting on the result.

The three tiers are detection, hunting, and analysis, and what separates them isn't query complexity or data volume or which engine runs them, but rather what's waiting on the result and what happens if the result comes back late.

Detection serves alerts that gate response and correlations that determine whether automated containment fires. Latency budget: seconds to sub-second. The thing waiting on the answer is an attacker, and the cost of being late is measured in hours-to-days of incident cleanup.
Hunting serves analyst-initiated retrospective investigation. The hunter has formed a hypothesis ("did this advanced persistent threat technique show up in our environment in the last 90 days?") and is interactively running queries to confirm or reject it. Latency budget: minutes to low tens of minutes. The thing waiting on the answer is an analyst's reading time between queries.
Analysis serves business intelligence dashboards, trend reporting, post-incident forensics, and compliance queries. Latency budget: minutes to hours to daily. The thing waiting on the answer is a Monday-morning executive readout, a quarterly compliance audit, or a forensic case file that has to be airtight before review.

Most of the architecture mistakes I've made and watched happen sit on the boundary between detection and hunting, because treating a hunting-tier tool as a detection-tier query produces missed alerts, while treating a detection-tier workload as a hunting-tier query produces wasted infrastructure. The boundary between hunting and analysis is more forgiving, since both tolerate minute-to-hour latency, and the cost of confusing them tends to be over-provisioning rather than missed signal.

Tier 1

Detection — seconds to sub-second.

The detection tier serves any workload where the answer to "did we miss this?" carries a multi-hour-to-multi-day cleanup cost. That includes alerts firing on live events, correlation rules that gate automated containment, and any logic where a missed firing means the attacker gets to do something the defender otherwise could have prevented.

The post-Mythos benchmark I use for this tier is "fast enough to fire before the 27-second breakout window closes." In practice that means detection logic that runs on events as they ingest, not on a scheduled query that wakes up every five minutes to scan a table. Sub-second is the goal. A few seconds is acceptable for correlations that need a small lookback window. Anything past roughly 30 seconds is structurally below the threshold for fast-moving intrusions, even before you account for alert-routing and human-response time on top of detection latency.

Architectural patterns that fit this tier

Pipeline-based detection. Tools like Tenzir, Cribl Stream, and Vector that embed detection rules in the ingestion pipeline itself, firing as events flow through, before anything lands in a table.
Streaming SQL with continuous queries. RisingWave, Apache Flink SQL, and ksqlDB running long-lived queries against event streams. The query is registered once and fires on every matching event indefinitely.
In-memory column stores with sub-second query latency. ClickHouse on a hot table tier, Apache Druid for time-series detection on rolling windows. Both can serve interactive queries on the most recent few hours of data.
Spark 4.1 Real-Time Mode for organizations already running Spark infrastructure. Production references are still emerging, so I'd treat this as a defensible choice rather than a proven one as of early 2026.
Pre-computed feature stores that streaming detection logic queries with sub-millisecond latency. Common in machine-learning detection pipelines where the model needs historical context but can't afford a roundtrip to a warehouse.

Patterns that do not belong in this tier

Scheduled queries against an Iceberg or Delta Lake table. The table format itself imposes a five-minute-plus floor on visibility for new data once you account for commit cadence, file compaction, and metadata refresh.
dbt incremental models with hourly cron triggers. These are excellent for the analysis tier and entirely wrong for detection.
Materialized views with five-to-fifteen-minute refresh cadence. The materialized view is fine. The cadence puts it in the hunting or analysis tier.
Anything whose architecture diagram includes the phrase "near real-time" without naming a number. "Near real-time" almost always means "five minutes," which means hunting tier at best.

The pattern these have in common is that they batch, which is the right move for cost-efficient analytical work but a structural mismatch for detection workloads. The post-Mythos floor doesn't forbid batching across the stack, because it only forbids batching in the path that gates response, and everything else in the stack can still batch happily.

Tier 2

Hunting — minutes to tens of minutes.

The hunting tier is the one most existing security-data-commons writing implicitly covers. When I argue for Iceberg over a proprietary format, or compare ClickHouse versus StarRocks versus DuckDB for ad-hoc work, or describe the Iceberg-versus-Delta decision, the implicit workload class is hunting. That's defensible because the hunting tier is where most of the day-to-day analyst work happens, and where the architectural decisions matter most for cost and analyst productivity.

The latency budget for hunting is dominated by the analyst's reading time between queries, not by attacker urgency. A 5-to-30-second query latency feels interactive. A 1-to-5-minute latency on a complex query against 90 days of data is acceptable, because the hunter formulates the next question while the previous one runs. The threat being investigated is already in the past; the architecture question is "how do we let the analyst iterate quickly," not "how do we beat the breakout window."

Architectural patterns that fit this tier

Iceberg tables on object storage queried by Trino, StarRocks, or DuckDB. This is the workhorse pattern, and it's the one most production lakehouse-for-security architectures land on for analyst-facing queries.
Materialized views and pre-computed aggregates for the top 20 hunting patterns. These save repeated computation for queries the team runs constantly and amortize the work across all analysts.
Z-ordered file layouts on common filter columns (source IP, destination IP, user ID). Z-ordering co-locates rows with similar values in the same files, so a query that filters on one of those columns scans far fewer files.
Hidden partitioning so analysts don't need to know the partition column. The analyst writes WHERE event_time > now() - interval '24 hours'; the query engine figures out which partition files to scan. Iceberg supports this; most other table formats require the analyst to know the partitioning scheme.
Notebook-based hunting (Jupyter, Marimo, or similar) querying the same lakehouse tables that the detection tier populates. The hunter gets full reproducibility and the same data the alert fired on, without a separate copy.

The mistake I want to flag specifically: the same Iceberg-table-plus-query-engine architecture that is excellent for hunting is not adequate for detection. Both tiers can read the same underlying data (that's the point of having a unified lakehouse), but the path that serves detection has to be the streaming layer, not the scheduled-query layer. Conflating the two is the most common architecture mistake I've watched lakehouse-first security teams make.

Tier 3

Analysis — minutes to hours to daily.

The analysis tier serves dashboards, executive reporting, trend analysis, post-incident forensics that go through formal review, compliance queries that need to be auditable, and anything with a regulatory deadline measured in days or weeks. If the latency requirement is "by Monday morning," this is the tier.

Daily refresh is fine. Hourly is generous. The work is batchable and benefits from being batched. Compute can run off-peak when storage and CPU are cheaper, results can be cached aggressively, and the audit trail of "this report was generated at 02:00 UTC against the data snapshot from 23:59 UTC the prior day" is exactly the property compliance reviewers want.

Architectural patterns that fit this tier

Batch dbt models building wide aggregate tables for business intelligence. dbt (data build tool) handles the dependency graph and lets analysts write SQL transformations rather than orchestrating Spark jobs by hand.
Scheduled Spark jobs computing 30-day baselines and rolling-window statistics for behavioral analytics. These feed the hunting tier as much as they feed the analysis tier; the baseline lives in the analysis tier and the hunter queries against it.
Iceberg tables tiered to cold storage (S3 Glacier Deep Archive or equivalent) for compliance retention. The cost per gigabyte drops by an order of magnitude or more, and the occasional retrieval cost is acceptable for data queried a few times a year.
Business intelligence tools (Superset, Metabase, Looker) reading the wide aggregates. The dashboard query runs against pre-computed numbers, not raw events, so the response time stays interactive even at petabyte scale.

This tier is the easiest to architect and the most stable. It's also where the biggest cost wins live, because storage is cheap and compute can be batched off-peak. If you're cost-constrained on a security data platform, the analysis tier is where I'd look first for savings before touching either of the latency-sensitive tiers.

Decision rule

Three questions, in order.

The decision about which tier a workload belongs in is almost always determined by who or what is waiting on the result, not by the technical complexity of the query. The framework I use, in order:

If this query result is delayed by five minutes, does an attacker get to do something they otherwise couldn't? If yes, it's Tier 1 (detection). The five-minute delay is the entire ballgame. Push the logic into the streaming layer.
If this query result is delayed by five minutes, does an analyst's investigation slow down but not break? If yes, it's Tier 2 (hunting). The analyst has reading time to spare; the query can run against an Iceberg table.
If this query result is delayed by five hours, does anything bad happen? If no, it's Tier 3 (analysis). Move it to the cheapest infrastructure that can run it overnight.

Most architecture mistakes I've made happen when I conflate questions one and two. A correlation rule that joins three log sources to detect lateral movement feels like a hunting-tier workload because the query is complex and the data volume is large. But if the answer gates an automated response, or if a missed alert means hours of cleanup, the correlation rule is a Tier 1 workload running in the wrong infrastructure. The right move is to push the correlation logic into the streaming layer (Flink, RisingWave, or a pipeline-based detection tool can express that logic) rather than waiting for the lakehouse to refresh.

The opposite mistake (running a hunting-tier workload on detection-tier infrastructure) is more expensive and less dangerous. You overpay for ClickHouse capacity to run queries that DuckDB-on-Iceberg could have answered in 30 seconds. Wasteful, but the alerts still fire. If I had to pick which mistake to make, I'd make this one, and I'd budget the overspend as the price of finding out which workloads belong in which tier.

What changes

The post-Mythos recalibration in concrete terms.

Pre-Mythos, the implicit story behind a lot of lakehouse-for-security writing (including some of my own) was "Iceberg plus ClickHouse handles everything; streaming is for the few cases where you really need it." That was a defensible story when the binding constraint on detection was analyst-response time, which sat in the minutes-to-tens-of-minutes range and matched the scheduled-query cadence well enough.

Post-Mythos, the story I'd tell instead is: "streaming handles detection; Iceberg plus ClickHouse or DuckDB handles hunting and analysis; the two tiers exchange data through Kafka or an equivalent backbone." The components don't change, but the labels on the arrows do, because the streaming layer stops being an optional add-on for high-end use cases and becomes the primary path for the detection tier.

The DetectFlow architecture is the concrete expression of this: Kafka as the fan-out layer, with one path serving detection (streaming SQL or pipeline-based rules firing on the topic) and a parallel path serving hunting and analysis (the same topic written to Iceberg via Spark micro-batch or Flink, queried by DuckDB, ClickHouse, or Trino), so the same data feeds two consumption patterns across two latency tiers.

There's a second axis worth naming, because it's easy to conflate with the latency tiers and it isn't the same thing. The latency tiers sort workloads by who is waiting. A separate axis sorts the table state by how realized the metadata is: virtual and ephemeral at one end, materialized files at the other. The two axes are orthogonal, but in practice they line up. Hot streaming detection pairs with virtual, ephemeral metadata (Streambased's Iceberg-over-Kafka, where a Kafka topic is read as an Iceberg table without a separate copy, or DuckLake inlining small commits directly into the catalog). Warm hunting and investigation pair with database-catalog metadata (DuckLake holding table state in a transactional database rather than a pile of manifest files). Cold forensic and audit work pairs with materialized files on object storage (classic Iceberg, with V4's metadata efficiency paying off most where data sits longest).

What holds the two realization ends together is that both sit behind one Iceberg read contract. The read contract is engine-agnostic: hand it a table identifier, get back a schema and a set of scannable bytes, and the engine doesn't need to know whether those bytes were materialized last week or synthesized from a Kafka offset a second ago. Where virtual and materialized actually diverge is the write contract, which is file-write versus SQL transaction versus never-write at all. That's the axis the streaming-ingest economics ride on. Sub-minute commits are exactly where per-commit file footprint turns into a cost floor, because every commit that writes a handful of tiny Parquet files and a fresh manifest pays a fixed metadata tax that a virtual or database-catalog write avoids. The read side can stay uniform while the write side carries the dollars-per-gigabyte difference.

I should state the null hypothesis plainly, because it's the version of this that would make the polyglot substrate unnecessary complexity. A single materialized Iceberg backend, made efficient enough by V4, might serve all three tiers on its own, and the whole virtual-to-materialized spread would collapse into premature optimization. I lean toward the tiering being real for security data (the commit-cadence economics at sub-minute latency look structural rather than incidental), but I hold that lean loosely, and I'd treat a V4-efficient single backend that serves detection latencies as the result that would falsify it.

I've now measured the write side of that contract, and the per-commit floor is real. On the MOAR reference stack against real MinIO object storage on a single host, I wrote the same 100,000 rows two ways: once as a single batch commit, then as a 100-commit stream standing in for sub-minute detection cadence. Plain Iceberg's metadata footprint went from 8.9 KB to about 4.6 MB across the stream, roughly 515 times larger, because each commit adds its own metadata.json plus a manifest and a manifest-list, and query planning slowed from 8.7 ms to 181 ms (about 21 times) as the planner walked the lengthening manifest list, while ingest stretched from 0.44 s to 16.3 s and the table accumulated 100 data files where the batch wrote one. DuckLake ran the identical stream with planning flat at roughly 7 ms regardless of cadence because its metadata lives in the catalog database, and with inlining on the small commits it wrote zero Parquet files. That's a single host on real object storage, so I'd read the shape rather than the absolute milliseconds: the cost that grows with commit cadence is the metadata proliferation, and a database-catalog or inlined write doesn't pay it.

That's evidence against the null as I stated it. A plain-Iceberg backend pushed to the detection tier's sub-minute commits pays exactly the floor this section describes, so the null only survives if the commit contract itself changes, and both alternatives on the table do change it: DuckLake's catalog-inlined commits keep the metadata out of object storage entirely, and a V4-efficient single-file commit would avoid the manifest proliferation rather than accumulate it. So a single backend can serve all three tiers, but not a single backend running the classic file-per-commit write contract, which is the version of the null that would have made the tiering premature optimization.

One maturity caveat I want to be honest about. The promise of a single read contract spanning all three realization tiers is, today, copy-bridged rather than native. DuckLake reaches engines through DuckLake-native clients, not an Iceberg REST endpoint, so the "one read contract" is currently stitched across two metadata systems rather than served by one. And the virtual end of the spectrum (Streambased's Iceberg-over-Kafka) is seed-stage and vendor-published; the performance claims are the vendor's, not numbers I've benchmarked. I'm describing where the architecture appears to be heading, not a configuration I'd hand a team to run in production this quarter.

What this recalibration does not change: Iceberg is still the right choice for the hunting and analysis tiers, and the case for vendor neutrality and multi-engine portability is independent of the detection-latency floor, so the lakehouse argument against schema-on-read SIEM pricing still holds. Streaming-based detection on a modern data stack is cheaper, more portable, and faster than scheduled-query detection on a vendor SIEM, and five-to-fifteen-minute latency is still fine for everything except detection, because hunting and analysis and baseline computation and dashboards and compliance all tolerate minute-to-hour latency without losing anything.

Common confusions

Edge cases I haven't fully resolved.

A few things that I haven't worked out cleanly and would welcome pushback on:

Where does the interactive analyst dashboard live?

A SOC analyst pulling up a dashboard during an active investigation has a latency budget that's neither "alert in five seconds" nor "report by Monday." It's something like "ten-to-thirty seconds feels responsive, one-to-two minutes feels slow but workable." Is that a fourth tier, or is it a sub-case of hunting? I currently treat it as hunting, but I'm aware that "interactive triage dashboard for an in-progress incident" has different latency dynamics than "ninety-day retroactive threat hunt," and the model might be cleaner with a fourth bucket.

Where exactly is the detection threshold?

I've been using "five-to-fifteen-minute scheduled query" as the canonical example of what's structurally below the detection threshold, because it's concrete and it's where most lakehouse-first teams actually run. The truer threshold is probably closer to 30-to-60 seconds and the boundary is fuzzy. I don't have a clean definition that distinguishes "fast enough to detect" from "too slow to detect" without referring to the specific attack technique being detected.

What about purpose-built sensor hardware?

Some vendors (Vectra, Corelight, Cisco Stealthwatch among them) push detection logic onto network sensors that operate at line rate, before the data ever reaches a streaming layer. Is sensor-local detection a separate tier, or is it just the detection tier implemented in dedicated silicon rather than streaming software? I treat it as Tier 1, but the architectural picture is different enough that the distinction may be worth its own framing.

How do you communicate the tier boundary to leadership?

A board wants to hear "we have AI-powered detection that catches threats in real time," not "we have a three-tier latency taxonomy where the streaming substrate handles detection while the lakehouse handles hunting." The translation between the two is nontrivial, and I don't think I've written a version yet that does justice to both audiences in a single document. This piece is closer to the practitioner end of that spectrum; the board-facing version is shorter, less hedged, and necessarily loses some of the qualifiers that matter to architects.

What this means

The takeaway, with appropriate hedges.

The three-tier latency model is descriptive, not prescriptive. It's a way to label what a workload actually needs so that the architecture conversation can stay focused on the right tradeoff per tier. Detection wants a streaming layer; hunting wants a queryable table format with a flexible engine layer; analysis wants cheap batch compute and durable storage. These aren't one problem in three sizes, but three different problems that happen to share the same underlying event data.

What April 2026 changed for me specifically: detection latency stopped being a tunable variable in the lakehouse-versus-SIEM tradeoff and started being a hard architectural constraint. The five-to-fifteen-minute scheduled-query cadence that most lakehouse-first security teams inherited from SIEM thinking (including, in places, my own writing) is structurally below the new threshold for detection workloads, so the streaming layer becomes essential rather than optional.

The hedge I want to keep explicit: Anthropic disclosed Mythos in April 2026. The capability is real and the AISLE replication establishes that it isn't gated to one frontier lab. The operational implications for defenders (what attackers actually do with this capability at scale, how incident response patterns shift, which detection signatures stop working) are still developing and may look different a year from now than they look in mid-2026. I'd treat the latency recalibration as the structurally right move given the capability data, and I'd revisit it as the operational data accumulates.

The architecture this points toward (Kafka as a fan-out layer, streaming detection on one path, Iceberg-plus-query-engine on a parallel path) is what DetectFlow describes. The three-tier latency model is the frame underneath. The model existed before Mythos; the urgency to draw the tier boundaries clearly may not have.