Technology deep-dive

The detection engineering maturity ladder.

Most active threat hunting programs I encounter operate at HMM2, where they apply procedures somebody else wrote, and the investment argument that gets them to HMM3, where analysts create novel techniques, is the argument I keep watching organizations lose. This essay is the version of that argument I wish I'd had in writing five years ago.

Reading time: about 20 minutes. Evidence tier: A for David Bianco's framework and SANS adoption, B for Expel production metrics and practitioner interviews, with the HMM-distribution figures and upskilling-timeline numbers flagged where they're survey-derived rather than peer-reviewed.

TL;DR

If you only read one thing

David Bianco's Hunting Maturity Model (HMM) is the framework I use to argue for analyst-trust as a force multiplier. HMM0 (none) → HMM4 (leading-edge).
Most programs sit at HMM2, with analysts applying procedures someone else wrote. The investment argument that gets them to HMM3 (analysts creating novel techniques) is the one I keep watching organizations lose.
HMM2 → HMM3 is the hardest step. It's a people + process change, not a tooling purchase. Programs that pitch HMM3 as "buy this product and you'll be there" reliably fail to graduate.
Why HMM3 matters beyond hunting: detection content authored at HMM3 survives platform migrations. Content authored at HMM2 (vendor-supplied rules with no theory of operation) doesn't. Every migration rewrites it from scratch.
What the investment actually looks like: dedicated upskilling time (10-15% of analyst capacity), peer-review process for detection rules, a feedback loop from incidents back to content. Not a tool budget.

Where the ladder comes from

Bianco's Hunting Maturity Model, briefly.

David Bianco published "A Simple Hunting Maturity Model" on the Detect/Respond blog in October 2015, and SANS adopted it, and practitioners adopted it, so a decade later, when somebody says "we're at HMM2," most security engineers in the room understand roughly what that means, which is rare for a maturity model and worth respecting.

The model has five levels (HMM0 through HMM4), and it does something most vendor maturity models conspicuously do not, because instead of grading you on what you bought it grades you on three things: the breadth of data your team collects, the sophistication of the analysis your analysts can perform unaided, and the degree to which successful investigations get codified into automation. Tools enable that progression, but they don't constitute it.

That framing is why HMM keeps showing up in my consulting conversations. The pillars I work from (source health, flow health, data quality, and analyst-trust as the force multiplier) map cleanly onto Bianco's model. The HMM ladder is, in operational terms, the analyst-trust pillar made measurable.

The five levels

HMM0 through HMM4, in operational terms.

HMM0 — initial

The team reacts to alerts from IDS, antivirus, and the SIEM's correlation rules. Data collection is whatever the alerting tools happen to capture. Hunting, in the active sense, doesn't occur. Bianco does not consider HMM0 organizations to be hunting at all; they're operating a reactive detection program. That's a defensible posture for some businesses, but worth naming so leadership doesn't silently assume the SOC is doing something it isn't.

HMM1 — minimal

The team is still primarily reactive, but it has begun routine data collection beyond the SIEM's retention window. Analysts run IOC lookups against threat-intel feeds and do historical lookback when a new indicator drops. Operationally this often looks like S3 or Azure Blob holding the previous twelve to twenty-four months of logs, with a thin query layer on top. There's no advanced analytics; the capability is "if a known-bad indicator shows up in feeds, we can check whether we've ever seen it."

HMM2 — procedural

The team applies hunting procedures that other people wrote, whether that's MITRE ATT&CK technique-by-technique playbooks, community Sigma rules, or published least-frequency-analysis patterns ("stack-count parent process for cmd.exe, look at the long tail"). Data collection is broad (endpoints, network, cloud, SaaS audit logs), and the query layer (DuckDB, ClickHouse, the SIEM itself) can answer the questions the playbooks ask.

This is, by the surveys I've seen and by the consensus in my own consulting interviews, where the majority of organizations with a named threat-hunting function actually sit, and SANS surveys put it somewhere north of half, though I think the real figure may be higher once you exclude organizations that claim HMM3 capability based on a single analyst's side project. Either way HMM2 is the modal state for most hunting programs rather than a deficiency, which is worth saying plainly so a team sitting there doesn't read this essay as an indictment.

HMM3 — innovative

The team creates new hunting procedures rather than only applying existing ones. That requires statistics (outlier detection, correlation analysis), linked-data analysis (entity-relationship and graph queries), data visualization beyond the SIEM's defaults, and at least prototype-grade machine learning. The data layer typically grows to include notebook environments (Jupyter, sometimes Databricks), versioning of analytical queries (Nessie or similar), and experiment tracking (MLflow). Hypotheses get framed, tested against historical data, and either validated into a new procedure or discarded. Published outputs include techniques the broader community didn't have before.

HMM4 — leading

Validated HMM3 hunts get automated into production detections. The pipeline is real: feature engineering from the lakehouse, model training and deployment, precision and recall monitoring, drift detection, A/B testing of detection variants. Analyst attention shifts toward novel adversary behavior because the known patterns are handled. The realistic ceiling for this stage isn't 95 percent automation but something closer to 30 to 40 percent of investigation time, and I'll come back to why.

The hardest step

HMM2 to HMM3 is a skills problem, not a tools problem.

Every other transition in the model is, at least partially, an infrastructure problem you can budget for. HMM0 to HMM1 needs cheap retention beyond the SIEM and a way to query it. HMM1 to HMM2 needs broader log collection and training on community playbooks. HMM3 to HMM4 needs MLOps engineering, which is expensive but well-defined. None of those steps require analysts to acquire a fundamentally new way of thinking.

HMM2 to HMM3 does, because an HMM2 analyst is trained to apply procedures while an HMM3 analyst has to create them, and that means statistics they probably didn't learn in their security-focused training, along with feature engineering and model validation, and hypothesis generation, experimental design, and the discipline to discard hypotheses that don't pan out, which is the hardest part, because security-analyst training rewards confident answers and HMM3 work rewards calibrated uncertainty.

David Bianco's standing recommendation on this point, given at multiple SANS Threat Hunting Summits, is some variation of "invest in your people." That phrasing makes the recommendation sound softer than it is. The translation that lands with finance is: HMM3 capability requires either hiring data scientists with security domain interest (scarce, expensive), upskilling existing analysts (slow, uncertain), or both. Practitioner interviews and SANS survey data point to roughly twelve to twenty-four months of dedicated time for an existing analyst to develop working HMM3 capability, assuming twenty to thirty percent of their week is protected for training and experimentation, and in most of the estates I've worked, the protected figure is closer to zero, which is why those teams stay stuck at HMM2.

I would put a hedge on those timeline numbers. They come from SANS Threat Hunting Survey data and from practitioner interviews conducted at industry conferences. That is Tier B evidence, not a peer-reviewed longitudinal study. The shape of the claim (skills take longer than infrastructure) holds up. The specific "twelve to twenty-four months" range may compress or extend by a factor of two in either direction depending on the analyst's existing math background, the quality of internal mentorship, and how protected the training time actually is in practice. Treat it as a planning number, not a guarantee.

The anti-pattern

Skipping HMM3 by buying ML-enabled detection.

The most common failure mode I see is an HMM2 organization that wants HMM4 automation and tries to buy its way past HMM3. A vendor sells a "ML-driven detection platform" that ships with pre-trained models, and those models generate false positives at a rate that's unworkable for the SOC's analyst capacity, because they were trained on somebody else's environment rather than on this organization's baselines. The SOC has no internal capability to retune them, because that retuning is the HMM3 capability that didn't get developed, so after three to six months the platform gets disabled or relegated to "informational only" status, and the team returns to HMM2 procedures.

This pattern repeats often enough that it deserves a name. I think of it as the "imported HMM4" problem. The infrastructure looks like HMM4 (there's ML in the detection pipeline, there's automation wrapped around it), but the organization's actual capability sits at HMM2, and the gap shows up as unmaintained models drifting toward irrelevance.

The correct sequencing is the boring one. Develop HMM3 capability internally: analysts who can frame and validate hypotheses, who can read a confusion matrix without flinching, who can decide whether a precision drop from 92 to 87 percent is acceptable for a given detection. Then, and only then, invest in the automation pipeline that turns validated hunts into production detections. The order matters because the validation skill is the thing that keeps HMM4 working over time, and buying that skill from a vendor gets you the artifact while leaving the underlying capability undeveloped.

Why the ceiling sits where it does

HMM4 automation tops out near 30 to 40 percent.

Expel's MDR practice publishes the most useful production reference point I'm aware of. Their "Ruxie" automation bot handles roughly 30 to 40 percent of investigations end-to-end, and Expel reports resolving critical incidents in about 17 minutes. Both Gartner and Forrester recognize Expel in their current MDR market coverage. That number sticks with people because it isn't the marketing claim but the honest description of what a well-run HMM4 program looks like after several years of investment.

The ceiling sits there, rather than at 80 or 95 percent, because HMM4 automation is built around human workflows: the human stays in the loop on every investigation that isn't routine, and the handoff overhead plus the never-zero population of novel cases caps the gains. The emerging agent-native architectures that some vendor and academic framings project toward the high 90s are still prototypes rather than production deployments, so 30 to 40 percent is the number to budget against for 2026, and a SOC that hits it has done difficult work. I lay out the full case for the ceiling, and why the agent-native numbers don't yet hold up, in the agentic-SOC reality piece.

The practical implication is that you shouldn't pitch HMM4 as "automate the SOC" but as "free analyst attention for the novel cases that matter," because the first framing oversells and erodes credibility once the 30 to 40 percent number lands, while the second framing matches what the production data supports and aligns with the business value, which is better hunting on the hard cases and faster response on the known ones.

What infrastructure supports each level

The data stack enables maturity. It doesn't deliver it.

I want to be careful here because the temptation, when writing an essay like this, is to pivot into a tooling sales pitch. The infrastructure section exists because detection engineers need to argue for budget, and the budget argument has to name specific capabilities. But the relationship between infrastructure and maturity runs one direction only. Better infrastructure enables higher maturity if the analyst skills exist; it cannot substitute for those skills.

HMM2 minimum

Broad data collection across endpoints, network, cloud, and SaaS. One to two years of retention on a lakehouse table format (Apache Iceberg is what I default to, for reasons I've written about elsewhere). A query engine that doesn't force the analyst into the SIEM's dialect; DuckDB and ClickHouse both work, depending on workload shape. Threat intelligence integration. The ability to apply community detection rules (Sigma) and MITRE ATT&CK-derived hunting procedures without quarterly license negotiations to add a new log source.

HMM3 additions

Notebook environments wired to the lakehouse. Jupyter against DuckDB is the cheapest starting point I know of, and it scales further than most teams expect. Versioning for analytical queries so experiments are reproducible (Nessie if you've already adopted Iceberg). Experiment tracking (MLflow) for the ML prototypes that HMM3 work generates. And, above all, time, because the dollar cost of the platform is comparatively small while the protected-analyst-time cost is the line item that gets cut first even though it matters most.

HMM4 additions

A real MLOps pipeline. Model deployment infrastructure (Kubernetes plus a serving layer, or a managed equivalent). Continuous monitoring of precision, recall, and detection drift. Feedback loops where analyst dispositions update training data. A/B testing infrastructure for detection variants. The engineering investment is substantial (typically two to three dedicated MLOps engineers, plus the infrastructure they run on), and that's before you count the data-engineering work to keep the feature pipelines stable. Eighteen to twenty-four months from "we have HMM3 capability" to "we have a production HMM4 pipeline" is the timeline I see most often, and it's still aggressive.

A worked example

What HMM2 to HMM3 actually looks like in motion.

The abstract version of the HMM2-to-HMM3 transition is easy to nod along to and hard to plan against. A more concrete version helps. Take a SOC running broad EDR, network, and cloud audit-log collection into an Iceberg lakehouse, with DuckDB available for analyst queries and ClickHouse fronting the production dashboards. The team applies MITRE ATT&CK procedures and community Sigma rules competently. They are unambiguously HMM2.

An HMM3 transition for that team starts with a hypothesis the team generates themselves. Something like: "When an attacker compromises a service account, the credential's usage pattern shifts in measurable ways within the first twenty-four hours: different source IP distribution, different time- of-day distribution, different downstream API call mix." That hypothesis is grounded in security intuition but not derived from a published playbook. The team has to validate it against their own data, and that validation is the HMM3 skill being exercised.

Validation looks like: pull six months of service-account authentication events from the lakehouse into a notebook. Engineer features for each account-day, such as source-IP entropy, hour-of-day distribution Wasserstein distance from the previous week, API call mix Jaccard similarity. Label the small set of known-compromised account-days from past incident reports. Build a simple model (logistic regression is a reasonable starting point; the team should resist the urge to reach for gradient boosting until they understand why the simple model fails). Compute precision and recall against held-out data. Decide, with calibrated reasoning, whether the model is worth promoting to a hunt procedure other analysts can run.

That sequence of steps (frame, engineer, train, evaluate, decide) is the HMM3 daily-practice skill, and it isn't exotic, because data scientists in other industries do it before lunch. The reason it's rare in security operations is that traditional security training paths don't include it, and the organizational reward structures in most SOCs penalize the experimentation overhead it requires, so twenty hours spent on a hypothesis that doesn't pan out reads, on the weekly metrics, as twenty hours of unproductive analyst time. Until that reward structure changes, HMM3 work tends to happen despite the organization rather than because of it.

The teams I've seen make the transition successfully do two things consistently. They set aside explicit experimentation time (typically one day per analyst per week) and they create a lightweight write-up format for failed experiments. The failed-experiment write-up matters because it makes the experimentation work visible to leadership as work, rather than invisible as absence of a deliverable. Once leadership can see the experimentation loop running, the budget conversation changes shape.

How to argue for the investment

The pitch that lands with finance.

Detection engineers who need to argue for HMM3 investment lose more often than they should because they pitch it as "more training, more tools." Finance hears that as cost without a tied outcome. The version that lands frames HMM3 capability as the prerequisite for any return on the automation investments the organization is already planning to make.

The argument has three moves. First: the SOC is already drowning in alerts, and the proposed remediation (more SOAR, more automation, eventually some kind of ML-driven detection) is the standard one. Second: that remediation, applied to an HMM2 organization, produces the imported-HMM4 failure mode: high false positives, unmaintainable models, eventual abandonment. The expensive tools sit in a partial-deployment state and the alert volume doesn't decrease. Third, investing in HMM3 capability first is what makes the subsequent automation investments work, so it reads less as an additional cost than as the precondition for the existing cost to produce a return.

That framing turns a "we want more training budget" conversation into a "we want to de-risk the automation program you already approved" conversation. The numbers attached to it should be conservative, so name twelve to twenty-four months for analyst upskilling, a thirty to forty percent automation ceiling at HMM4 maturity, and a meaningful share of MLOps engineering work failing if it lands ahead of HMM3 capability. Promise the realistic outcomes and over-deliver if you can, but don't lead with the ninety-five percent automation slide that the vendors are still selling.

One additional move that helps in regulated industries: HMM3 capability is the thing that lets the SOC explain its detections. If your auditors or your regulators want to know why a given alert fired, an organization with HMM3 analysts can answer that question. An organization with imported HMM4 tooling cannot, because nobody on the team can read the model's weights. Detection explainability is becoming a real regulatory expectation in finance and healthcare, and HMM3 capability is what makes it possible.

Self-assessment

Where is your team, honestly?

Self-assessment against the HMM ladder is more useful when the questions are uncomfortable. The ones I ask in scoping conversations:

When the team runs a hunt this month, where did the hunting procedure come from? If the honest answer is "MITRE ATT&CK, a vendor blog, or a community Sigma rule," that's HMM2. If the answer is "an analyst on this team developed it and validated it against our environment," that's HMM3.
Can the team articulate, in writing, the difference between a true positive and a true positive that generalizes? HMM3 work depends on that distinction; HMM2 work doesn't require it.
What percentage of analyst time, protected and on the calendar, is allocated to experimentation that may not produce a deliverable? If it's zero, the team is structurally HMM2 regardless of what the tooling looks like.
When a vendor demo shows ML-driven detection, can the team articulate what they'd need to see to trust the model in their environment? HMM2 teams can identify the marketing claim. HMM3 teams can name the validation experiment they'd run.
If a detection in production starts firing twice as often as it did last month, can the team tell you in an afternoon whether the change is signal or noise? That's the daily-operations test of HMM3 versus HMM2.

Few if any of those questions appear in the vendor maturity assessments I've seen, and they're the ones that distinguish organizations whose maturity matches their stated level from organizations whose maturity matches their tooling.

Conclusion

Skills over tools, with budget attached.

The Hunting Maturity Model has lasted a decade because it refuses to confuse what you bought with what your team can do. The investment argument that derives from it is consequently inconvenient. The dollar line items that move maturity forward are the protected analyst time and the upskilling path, not the platform license, and those line items are the first to get cut when budgets tighten.

The argument I keep making to detection engineers is that the HMM ladder is the analyst-trust pillar made operational, so a SOC that can articulate where it sits on the ladder, with honest evidence, has something rare, which is calibrated self-knowledge about its own detection capability, and that self-knowledge is what makes the next investment work, whether the investment is more data, more analysts, or more automation.

The version of this conversation that doesn't work is the one that promises HMM4 automation by Q4 and skips the twelve to twenty-four months of HMM3 capability development. The version that does work names the realistic timeline, names the 30 to 40 percent automation ceiling, and frames the analyst investment as the precondition for the automation investment to produce a return, and that's the argument worth bringing to the budget meeting once the supporting numbers are in hand.