Methodology
MLOps for threat hunting
Operationalizing Model-Assisted Threat Hunting (M-ATH) from the PEAK framework. A notebook model degrades the moment telemetry shifts or an adversary adapts; MLOps is the engineering discipline — feature store, model registry, orchestrated retraining, drift monitoring — that keeps an algorithmic hunt viable in production.
CI/CD-automated retraining is the threshold where M-ATH stops being a one-off notebook hunt. Google Cloud's MLOps maturity model: Level 0 (manual notebook, no drift defense) → Level 2 (pipelines that detect drift and retrain autonomously). Below Level 1, the model is stale before the hunt ends.
The pipeline
-
Prepare
PEAK + feature store
Frame the hypothesis; engineer features once in Feast — consistent offline training and online inference.
-
Train
M-ATH model
Supervised classification, clustering, time-series, NLP on petabyte telemetry; experiments tracked in MLflow.
-
Deploy
Orchestrated retraining
Kubeflow / ClearML pipelines retrain and redeploy on schedule or on drift — continuous delivery, not point-in-time.
-
Monitor
Drift + MLSecOps
Data- and concept-drift detection (W&B); poisoning / evasion guardrails feed back to the retrain loop.
What composes, what’s brittle
- PEAK framework. Bianco, Fetterman, Marrone (Splunk SURGe). M-ATH is the algorithmic hunt type alongside hypothesis-driven and baseline.
- Data vs concept drift. Distribution shift vs adversary adaptation — both degrade silently into a false-positive avalanche or a false-negative blind spot.
- Why Level 0 fails. A notebook model trained offline and discarded cannot counter non-stationary, adversarial telemetry.
- Tooling. Feast (features), MLflow (registry / Detection-as-Code), Kubeflow & ClearML (orchestration), W&B (drift + reasoning observability).
- MLSecOps. The hunt infra is itself an attack surface — retrain-loop poisoning, evasion inputs, MLflow CVE-2026-2635, the Kubeflow Doki incident. MITRE ATLAS.
- What's hard. Continuous-retraining cost; poisoned retraining loops; the org gap between data science and detection engineering.
Sources: Splunk PEAK Threat Hunting Framework (Bianco, Fetterman, Marrone — SURGe); "The Threat Hunter's Cookbook" (Fetterman & Marrone); Google Cloud MLOps maturity model (CI/CD for ML); MITRE ATLAS; CVE-2026-2635 (MLflow authentication bypass); PROID compromise-assessment framework (peer-reviewed, PMC).