Project 2 · DetectFlow

Detection at scale, without the operational debt.

Most detection programs cap out at hundreds of rules — not because the threats stopped multiplying, but because the operational debt of maintaining the rules ate the team alive. DetectFlow is the discipline that lets a SOC carry thousands. Detection-as-code, CI/CD pipelines for detection content, telemetry feedback loops, automated regression testing. Each rule is a versioned, tested, deployable artifact whose performance and false-positive rate are continuously measured.

The debt problem

The detection backlog grows; the team doesn't.

Every SOC of meaningful size hits the same wall. Year one of the program: 50 detections, freshly tuned, owned by an engineer who knows each one's quirks. Year three: 350 detections, half of them somebody-else-wrote-it, false-positive rates drifting because telemetry shapes shifted but the suppressions didn't follow. Year five: a backlog of 800 detections in the queue, and analysts spending more time tuning existing rules than hunting for what the existing rules don't catch. Detection engineering velocity has collapsed; the team is in maintenance mode permanently.

The mechanism is structural. Detection content is treated as artisanal — each rule is a craft item with an implicit owner, an implicit test (somebody ran it once and it caught the simulated attack), and an implicit suppression list (the suppressions that accumulate over time as false positives surface in production). When the implicit test breaks because the underlying telemetry changed, nobody notices for weeks. When the implicit owner leaves, the rule becomes nobody's responsibility. The maintenance cost compounds with rule count, but the team headcount doesn't.

The fix isn't more rule writers. It's treating detection content as software — versioned, tested, deployed through CI/CD, monitored with telemetry feedback loops, owned by a system rather than an individual. That's DetectFlow.

Detection-as-code

Each rule is a versioned, tested, deployable artifact.

Versioning.

Detection content lives in source control alongside the rest of the security engineering codebase. Each rule is a file with a defined schema — query language, MITRE ATT&CK technique mapping, suppression logic, expected false-positive rate, owner, last review date. Pull requests carry tests; merges deploy automatically. Rollbacks are git-revert. The concepts are mundane in any other discipline; the absence in detection engineering is the surprise.

Testing.

Every rule ships with two test types. Positive tests — synthetic events or replayed historical telemetry that the rule should fire on, asserted on every commit. Negative tests — events known to be benign that the rule should not fire on, asserted on every commit. Both run in CI before merge; failures block. The discipline catches regressions early — when the underlying telemetry shape shifts and a rule stops matching, the negative-test suite or the positive-test suite breaks loudly, before the rule silently degrades in production.

Deployment.

Approved rules deploy through a CI/CD pipeline. Canary stage runs the rule against production telemetry with alerting suppressed for 24–48 hours, surfacing the actual false-positive rate before any analyst sees the alerts. Promotion to active stage requires the canary metrics to satisfy the rule's defined false-positive threshold. Demotion is automatic if the false-positive rate exceeds threshold for 72 hours consecutively. The rule lifecycle is governed by metrics, not by tribal memory.

Telemetry feedback.

Every rule's production performance is measured continuously: alert volume, true-positive rate (validated via incident retros), false-positive rate (analyst-marked dispositions), suppression-pattern drift, coverage overlap with other rules. The data shows up in dashboards with trend lines per rule, and feeds back into the rule metadata as input to the next review cycle. Rules that have drifted from their committed false-positive thresholds get flagged for re-tuning automatically; rules that haven't fired in months get flagged for coverage review. The detection program runs on metrics, not on the analyst-team's collective memory of what each rule was supposed to do.

Coverage

MITRE ATT&CK and D3FEND, mapped explicitly.

Detection content gets organized by MITRE ATT&CK technique, with D3FEND defensive-technique mapping layered on top. The two frameworks together produce the coverage heatmap — which techniques the program actually detects, which it nominally covers but with weak rules, which it doesn't cover at all. The heatmap drives the prioritization conversation in incident retros and detection roadmap reviews; it replaces the more common "we have 350 detections" headline that treats every detection as equivalent in value.

The coverage view also exposes overlap. Five rules detecting the same technique with subtly different logic isn't five-times-coverage; it's a maintenance multiplier with marginal extra detection. The framework's coverage analysis surfaces these overlaps and forces the consolidation conversation — what's actually load-bearing, what's deprecated, what's the rationalized rule that replaces three. Without that analysis, the rule count grows without the coverage growing.

Platform-agnostic

Detection content that survives the platform decision.

Detection content written in standard SQL ports cleanly between Trino, ClickHouse, StarRocks, and Dremio — the engines documented on the components page. SPL (Splunk's query language) and KQL (Microsoft Sentinel's) don't port — they're vendor-specific, and content written in them is content the next migration has to rewrite. Chronicle's YARA-L is its own dialect with similar portability problems. The choice of detection-content language is a multi-year commitment; standard SQL is the one that travels.

The DetectFlow engagement supports content authoring in SPL, KQL, YARA-L, Trino SQL, and Dremio SQL — whichever the existing platform uses. The migration path from SPL or KQL to standard SQL is part of the engagement when the underlying platform is moving (typically alongside a MOAR migration). For shops staying on Splunk or Sentinel, the engagement shapes the existing platform's content rather than forcing a rewrite — the discipline applies regardless of language.

The harder migration question is the SPL-specific patterns that don't have direct SQL analogs — transactional event-grouping logic, certain `eval` and `streamstats` patterns, vendor-specific data model assumptions. The engagement quantifies the rewrite cost up front; some content survives the migration cleanly, some requires redesign, some should stay on the source platform with federated query as the bridge. The portability claim has conditions; the engagement names them up front.

When to pick it

Three patterns that signal DetectFlow is the next investment.

The signals that recur across DetectFlow engagements:

The detection backlog grows faster than the team can maintain. The classic shape: new threats land in the threat-intelligence feed, the team writes detections for them, the existing detections drift in maintenance mode, and the velocity drops every quarter even though the headcount is stable.
Analyst time is consumed by tuning rather than hunting. When the senior analysts spend their week chasing false positives in existing rules instead of investigating novel patterns, the program has flipped from offense to defense at the wrong layer.
Incident retros surface detections that should have fired and didn't. The rule existed; the rule didn't catch the incident. Either the underlying telemetry shifted and the rule silently broke, or the rule's logic missed a variant that an attacker found. Both are failure modes DetectFlow's testing and telemetry feedback specifically catch.

DetectFlow doesn't replace detection engineers; it makes detection engineering scalable. The engagement is structurally complementary to the MOAR work — the data platform that supports detection-content portability is the same data platform that supports the testing and telemetry-feedback infrastructure DetectFlow runs on. The two often ship together when the migration scope warrants it.

The engagement shape

Detection Engineering Modernization — the productized form.

Detection Engineering Modernization ($50K–$120K, 4–8 weeks). The full DetectFlow engagement. Pricing scales with use-case count and platform complexity. The deliverables:

Maturity assessment. Defined → Managed → Optimized framework, mapped against the existing program. Not aspirational — based on the actual artifacts the program produces today.
100–200 use cases organized by MITRE ATT&CK technique. The starting catalog; production environments add to it from there.
D3FEND defensive-technique mapping. Layered on top of the ATT&CK coverage to produce the dual-framework heatmap.
Platform-agnostic detection content. Authored in the existing platform's language; portability framework documented for future migrations.
Coverage gap heatmap. What the program detects, what it nominally covers but with weak rules, what it doesn't cover at all. Drives the prioritization conversation for the next quarter.
Tuning playbook for false-positive reduction. The patterns that recur across SOCs, encoded as repeatable steps the team can apply without consulting calls.

Strong synergy with MOAR. Frequently sold as a follow-on once the architecture decision has settled — the data platform gets built, then the detection program runs on it. For shops staying on Splunk or Sentinel, the DetectFlow engagement runs against the existing platform without requiring a platform migration first.

Detection content treated as software, not as craft.

MOAR is the data platform. MLOps-hunting is the next layer. DetectFlow is the discipline that makes both operationally tractable.

Back to thesis → See engagements