Practitioner deep-dive

Who actually does the hunting.

There's a kind of work that happens at two in the morning when an alert that should have caught something didn't, and somebody has to go figure out why. You pull the rule, it looks fine. You pull the query, it runs, it returns nothing. So you go a layer down and start reading the field mapping, the thing that decided which piece of the raw log became which field in the normalized event, and after an hour of squinting at it you find that the field the rule keys on was mapped to a path that isn't in the data this rule actually runs against. The shape was right but the meaning was wrong, so nobody errored, nothing turned red, the detection just quietly matched nothing for however long it had been deployed, and you, at two in the morning, are the person who reverse-engineered the broken mapping back to the decision that caused it.

Reading time: about 8 minutes. Evidence tier: B-to-C, identity and observation rather than measurement, with the measured side routed to the sibling essays where it lives.

Venn of three overlapping roles — Security Analyst (domain-specific analysis), Data Engineer (scalable pipelines, governance, schema evolution), and Data Scientist (behavioral analysis, machine learning, statistics) — with the Threat Hunter spanning all three. The analyst–engineer overlap is ingest, validate, parse, normalize; a toolset spine runs beneath: SIEM, Data Engineering, MLOps. — The 2am hunter is standing in the overlap — the work spans analyst, data engineer, and data scientist at once.

A name for the 2am work

That analyst was doing applied ontology.

I want to put a name on what that person was doing, because I think the name changes how you see yourself and what you can reach for. They were doing applied ontology. That word is going to sound academic and a little ridiculous applied to a 2am debugging session, so let me say what it means in plain terms before it scares anyone off the page, because the thing underneath the word is something a hunter already does. An ontology is just an agreed map of what the things in your data actually are and how they connect: a process, a file, a user account, a network connection, and the real relationships between them, written down clearly enough that a machine can check it. Applied ontology is using that map to decide whether a specific mapping is right, whether the field you called a destination really is the kind of thing the schema means by a destination. The analyst tracing a broken mapping back to a crossed meaning is doing exactly that, by hand, under time pressure, without the map written down anywhere. They're reconstructing the map from the wreckage.

You already do this

You already do this, you just don't call it that.

I came to this the slow way, by being the person hunters came to. For years they landed at my desk for query help, and the pattern was always the same, which is that they could hunt fine and what they couldn't always do was the data part, the work of getting the question to line up with the logs they actually had. I wrote about that in an earlier piece and framed it as threat hunting being data science under a different vocabulary, the hypothesis and the precision number and the write-up all having data-science names nobody in the room used. This is the same observation pushed one layer down, into the part that bit them most often. The data part wasn't only writing the query. It was knowing what the fields meant, and noticing when a field meant something other than what its name promised, and that noticing is data modeling whether or not anyone called it that.

Every decision about which raw token becomes which normalized field is a modeling decision. Whether a given field is the actor or the target, the source or the destination, the process or the file the process ran from, is a modeling decision. When you write a Sigma rule that names Image or dst_ip you are trusting a chain of those decisions that somebody else made, usually whoever wrote the integration, usually without writing down why. So when a hunter reverse-engineers a broken mapping they aren't doing some lesser janitorial version of the real work. They're doing the modeling the integration skipped, after the fact, which is harder than doing it up front because now you're inferring the original intent from a detection that went dark instead of reading it off a map. The skill is real and it's specialized, and most of the people who have it have never been given the words for it or told that a whole field of tooling exists for the thing they keep doing by hand.

That's the recognition I want to offer, and I want to be careful not to turn it into a pep talk, because the work is genuinely hard and pretending otherwise would insult the people who do it. Naming the skill doesn't make the 2am session shorter. What it does is connect you to a body of open tooling and a community that has been working the same problem from the formal side, so the next time you trace a crossed meaning you have somewhere to stand instead of starting cold.

What naming it gets you

The open tools stop looking like academic furniture.

Here's the payoff, because recognition that doesn't hand you anything is just flattery. Once you see the broken-mapping hunt as applied ontology, a stack of open tools stops looking like academic furniture and starts looking like instruments for the job you already do. D3FEND, MITRE's map of defensive techniques and the digital artifacts they act on, is a formal written-down version of part of the map you've been reconstructing by hand: it says what a process and a file and a credential are as things a defense can act on. OCSF gives you the shared shape your detections write against, the agreed field names so content doesn't get rewritten per vendor. Sigma is the portable detection format your rules already live in, and there's open work, pySigma and its OCSF mapping pipeline, on making those rules carry their field mappings correctly across schemas instead of breaking silently at the boundary. And there's a reasoner toolchain, ROBOT and ELK, which is just software that reads a map and your mappings and works out whether they contradict each other, that can in principle catch the crossed meaning before it ships rather than after a breach.

None of that is mine and none of it is closed. It's the open commons, and the reason it matters to the person tracing a broken mapping at 2am is that the broken mapping you found by hand is often a gap somebody else hit too, sitting in one of those projects as an open issue. The map is thin in real places, which I'll be honest about rather than oversell: D3FEND defines what the artifacts are but ships almost none of the assertions that would let a machine catch a crossed mapping automatically, so off the shelf a reasoner has no basis to object when a user gets mapped to a process. That gap is exactly the kind of thing a practitioner who's been bitten by it is well placed to help close, because you've seen the failure in the wild and the academics filling out the ontology mostly haven't.

The on-ramp

Where to start, and why it belongs upstream.

So here's where to start, and I'd point you at the open upstream projects rather than anything of mine, because that's where your effort goes furthest and because contributing there builds something that outlasts any one company's repo. If you want to read the formal version of the map you've been reconstructing, D3FEND is public and browsable, and reading its artifact definitions against your own field mappings is a useful afternoon that will probably surface a mapping you've always half-suspected was crossed. If you write Sigma, the pySigma project and its OCSF mapping pipeline are where the field-portability problem is being worked, and a hunter who's debugged a rule that broke crossing schemas has direct evidence those maintainers want. If you've found a coarse or crossed mapping in OCSF or D3FEND, both take issues, and a well-described "this field is mapped to a kind of thing it isn't" report from someone who hit it in production is more valuable than another round of theory.

The honest framing of the difficulty matters here too, because the hard part of this work isn't the tooling, it's the judgment. Deciding that two kinds of thing can't be the same, that a process is never a user account, is sometimes clear and sometimes contestable, because a credential can be stored in a file, so asserting that a credential is never a file would break a valid mapping and manufacture a false alarm. That adjudication is the real work, and it's the work a practitioner's ground truth improves, because you know which overlaps hold in actual telemetry and which are just sloppy mapping. The community needs that knowledge more than it needs another reasoner.

I'm flagging the evidence tier on all of this plainly, because it's the kind of claim that's easy to inflate. The reframe in this essay is Tier B-to-C: it rests on identity and observation, on years of being the person hunters came to and on watching the same broken-mapping hunt play out across teams, not on a controlled measurement. The measured side of this work, the check that actually catches a crossed mapping before it ships, lives in the sibling essays and in the tooling they point at, in the data-modeling foundation, in the deductive gate that catches the mistake, and in the tools you can run today; this one is about seeing the work clearly enough to know it's yours. The companion piece, MLOps tools for threat hunters, makes the adjacent case, that the data-science instruments are for the role you already play rather than a different role you'd have to become, and the two essays are the same argument aimed at two layers of the same workflow. A small open demo you can clone and run is up now at security-data-that-works.

The reframe I'd leave you with is small and I think it holds up. The analyst who reverse-engineers a broken field mapping at 2am isn't doing grunt work beneath the real hunting. They're doing applied ontology, reconstructing by hand a map that the open commons has been building from the other direction, and the gap between the two is where a practitioner's hard-won knowledge is worth the most. You were already doing the modeling, so naming it gives you the vocabulary, the open tools, and a community that's been waiting for people who've actually been bitten by the failure to come help fix the map.