Security Data Works

Practitioner deep-dive

The ground you're already standing on.

The worst detection failure I've watched wasn't a rule that fired wrong. It was a rule that never fired at all, and did it quietly. Someone had written a perfectly reasonable detection for a specific kind of authentication event, it passed review, it deployed clean, and it sat there for months matching nothing, because the field it keyed on had been mapped, somewhere upstream in the integration, to a path that didn't exist in the data it was actually running against. No error. No exception. No red text anywhere in the pipeline. The query compiled, the search ran on schedule, it returned zero results every time, and zero results from a detection looks exactly like a quiet network, right up until you go hunting for the alert that should have caught something and find out it was never able to.

Reading time: about 9 minutes. Evidence tier: B, my own groundings and a single corpus, with the limits stated where they bite.

Grounding method · failure modes

Loud failure versus silent failure in a field mapping, and how a disjointness axiom converts one into the other A loud failure errors out at build time: the mapping references a path that does not exist, the build goes red, and you notice. A silent failure is the dangerous one: the mapping is sound, so it passes every build and schema check and the build goes green, yet it is wrong because it encodes an incorrect real-world claim — the field looks fully populated but holds the wrong kind of thing. Adding a disjointness axiom about the artifact types turns the silent-wrong mapping into a loud-caught one, because the contradiction the axiom introduces is now something the reasoner can detect at build time. Loud failure — the one you notice mapping references a path that isn't there build goes red · caught Silent failure — the dangerous one mapping is sound: passes every check, but holds the wrong kind of thing build goes green · wrong The field looks fully populated, so nothing in the build or schema check has anything to complain about — the broken claim is about meaning, and meaning is not what those checks read. Add a disjointness axiom, and the silent case turns loud silent-wrong mapping green build, broken meaning + axiom artifact types asserted disjoint now a detectable contradiction loud caught Measured, tier B (first-cut): an OCSF→D3FEND construction found 922 classes / 20.9% sound-but-wrong — ELK reproduced it as consistent, 0 unsatisfiable, so it survives the build silently.
The error worth worrying about is not the one that errors out, since a red build tells you where to look; it's the one that is logically sound, passes every build and schema check, and is still wrong because it encodes an incorrect claim about the world, so the field reads as fully populated while holding the wrong kind of thing. Adding a disjointness axiom about the artifact types is what converts that silent-wrong mapping into a loud-caught one, because the axiom gives the reasoner a contradiction it can detect. Evidence tier B, a first-cut measurement: an OCSF→D3FEND construction found 922 classes (20.9%) that are sound-but-wrong, and the ELK reasoner reproduced the result through ROBOT — the ontology stays consistent with 0 unsatisfiable classes, which is exactly why it survives the build without complaint.

The foundation nobody named

You've been doing data modeling the whole time.

If you've been doing this for a while you've probably got your own version of that story, and I think the reason these failures are so common and so hard to see is that nobody told us we were doing data modeling. We think we're writing detections. We're writing field mappings and normalization rules and parser configs, and underneath all of it we are making decisions about what the things in our data are and how they connect, which is the actual definition of data modeling, and we're mostly making those decisions implicitly, by hand, with no check that any particular decision was right. So I want to make a case I think is worth your time even though it sounds dry on the label: data modeling is the foundation you're already standing on, you've been doing it the whole time without naming it, and once you have the vocabulary for it there's a check you can run that catches exactly the silent failure I just described.

Walk through what happens to a log on its way to becoming a detection. A vendor's appliance emits a record. Some integration parses that record into fields. Those fields get normalized into a shared shape so your content doesn't have to be rewritten per vendor, and if you're on a modern stack that shape is probably OCSF, the open schema that says a network-connection event has these attributes with these names. Your detection is written against that shared shape. At every one of those steps somebody decided that this piece of the raw log means that field in the normalized event, and every one of those decisions is a claim about meaning. Saying a vendor's dest field becomes OCSF's dst_endpoint is a claim that the thing the vendor called a destination is the same kind of thing OCSF means by an endpoint, and that claim can be right or wrong, and most of the time nothing checks which.

That's the part worth slowing down on, because it's where the trouble lives. The shape is one thing and the meaning is another. The shape is "there's a field here called dst_ip and it holds an IP address." The meaning is "this is the address the connection went to, not the one it came from." A mapping can get the shape exactly right and the meaning exactly wrong, and when it does, your tooling won't complain, because schema-conformance only checks the shape. The field is present, it's the right type, it validates. It just means the wrong thing, or it points at a path that isn't there, and the detection downstream inherits the mistake without any signal that a mistake was made.

This is what I mean when I say you're already doing data modeling. The choice of which raw token becomes which normalized field is a modeling choice. Whether a given field is the actor or the target, the source or the destination, the process or the file the process ran from, is a modeling choice. You make dozens of them every time you onboard a source, and the quality of your detections rests on those choices being right far more than it rests on the cleverness of the detection logic, because a brilliant rule keyed on a field that means the wrong thing is still a rule that matches nothing.

A little vocabulary

A few words that genuinely earn their place.

I held off on the words on purpose, but a few of them genuinely earn their place, so here they are in plain terms. An ontology is just an agreed map of what the things in your data actually are and how they connect: a process, a file, a user account, and the real relationships between them, written down so a machine can check it. Semantics is what the data means rather than what shape it's in. Grounding is tying your fields to a shared, checkable definition, so instead of "I called this field process and I hope that's right" you've said "this process field is the same kind of thing D3FEND calls a Process," and now a tool can verify it. Disjointness is the part that does the catching: stating on the map that two kinds of thing can't be the same individual, that a process is not a user account and a file is not a URL, even when they're related to each other. A process is executed from a file, but the process and the file aren't the same thing, and writing that down is what gives a checker something to object to.

D3FEND, MITRE's map of defensive techniques and the digital artifacts they act on, is the one real formal ontology in this stack, and it's the natural thing to ground OCSF fields into, because it already defines what a process and a file and a credential are as artifacts a defense can act on. So the move is to ground OCSF's objects into D3FEND's artifacts and then ask a reasoner, which is just the program that works out what follows from what you've told it, whether your mappings hold together. That sounds like a lot of machinery for a field mapping, and I was skeptical it would pay, so I built it and measured it rather than argue about it.

The check that catches it

A reasoner can object where a syntax check can't.

Here's the awkward part I have to be honest about, because it's the whole reason this isn't a solved problem already. D3FEND off the shelf doesn't catch the silent error. It ships only three disjointness pairs in the entire ontology, and none of them are among Process, File, UserAccount, NetworkSession, and NetworkNode, the artifacts your core OCSF objects actually map to. So nothing in D3FEND says an entity that is a process can't also be a user account, which means a reasoner has no basis to object when you map a user to a process, and the wrong mapping sails through. The silent-failure mode isn't only in the integrations and the LLMs that generate mappings; it's baked into the reference ontology itself, because the assertions that would let a machine catch the error were never added.

So the gate I built adds them. It grounds each mapping two independent ways, the OCSF path it maps to and what the source field itself actually means, then asserts a disjointness layer over the handful of artifacts those mappings touch, and asks ELK, the reasoner, whether the two groundings agree. A type-preserving mapping agrees and stays satisfiable, which is the formal way of saying "this could be true." A type-crossing mapping, a user mapped to a process, disagrees under the disjointness layer and becomes unsatisfiable, a class that can't possibly be true, and the reasoner flags it and the build exits non-zero. That's a deductive gate: it catches the mapping mistake that silently kills a detection by checking meaning, not just syntax, with no machine-learning model anywhere in the loop, just logic over definitions.

The numbers are the reason I think this is worth your attention rather than a neat trick. Run against a real six-schema crosswalk corpus, 925 mapping rows drawn from CIM, UDM, ASIM, ECS, OpenTelemetry, and Zeek into OCSF, with the type-crossing corruptions injected the way the silent error happens in the wild, the gate caught 100% of them, 231 of 231 distinct mapping classes, with zero false positives attributable to the disjointness layer. It missed nothing in the error class it targets. And on the way it flagged about eight genuine coarse mappings in the real hand-built corpus that a human reviewer should look at, an application mapped to a destination host, a principal used as a host, a file path mapped to a process name, which is the gate doing exactly the job you'd want, finding the meaning-crossings a shape check waves through. The first-pass version isolates a single wrong mapping as the one unsatisfiable class in 3.69 seconds over the full D3FEND ontology on a laptop, half a gigabyte of memory, so this is the kind of check you could put in CI and have it run on every pull request, not a research artifact that needs a cluster.

I want to be careful about what I'm claiming, because the honest boundaries matter. This is Tier B evidence: my own groundings, my own disjointness adjudication, a single corpus, one reasoning pass. The 100% catch is measured on injected corruptions, not a held-out set of confirmed human errors, because the corpus doesn't ship labeled mistakes; the eight organic flags are the closest thing to real catches and they're plausible coarse mappings, not confirmed-wrong ones. The hard part, and the part nobody wants to do, is the disjointness adjudication itself, deciding which artifacts are genuinely disjoint versus legitimately overlapping, because a credential can be stored in a file, so asserting that a credential is never a file would break a valid mapping and manufacture a false alarm. That judgment held across this corpus with eight artifacts; a much larger surface might hit a wall this one didn't. I'd rather tell you that than oversell it.

The same split between shape and meaning shows up one layer down, in the bytes themselves, and it caught me off guard while I was building the corpus for this. I'd assumed that re-deriving the same data produces the same file, and so the same hash, which is the assumption underneath content-addressed storage, dedup, and chain-of-custody integrity checks. It doesn't hold by default. Write the same rows to Parquet through a parallel query engine like DuckDB and the file's size and its SHA-256 move from run to run, because the engine hands back rows in a different order each time and the encoding is order-sensitive, even though the data is identical down to the row. A hash of the file is a shape check, and it can disagree while the substance is unchanged, so an integrity check can report a change that never touched the data. The fix is the one the open formats already made, comparing logical content rather than raw bytes, which is why Iceberg tracks identity at the manifest level instead of asking you to diff files: the check has to be aimed at what the data means, not the form it happens to take.

Try it, contribute back

The pieces are open, and the fixes belong upstream.

None of this is mine to keep, and the point of writing it down is to put it within reach rather than behind a paywall. The pieces underneath are all open and runnable today: D3FEND and OCSF give you the artifacts and the schema, Sigma and the pySigma OCSF pipeline give you portable detections, and ROBOT and ELK are the reasoner toolchain that does the checking, so the gate is logic over open definitions and you can stand it up yourself. The demo is packaged to clone and run with the Java toolchain wrapped so you don't have to fight a JDK version to watch a correct mapping pass and a deliberately wrong one fail, and it's up now at security-data-that-works. The more useful thing alongside it is that most of the gaps I hit are already tracked as open issues in those projects, so if you find a coarse mapping or a missing disjointness pair, the fix belongs upstream in D3FEND or OCSF rather than in your own private patch. The whole point is to send you toward the commons, not to fork it.

The same "make the wrong thing loud" discipline runs one layer down, on the data itself. The Foundation data-health check I'd been running as a notebook is now a standing gate over the live lakehouse, and a subset of it ships in that same repo as a one-command check: the data-quality dimensions (completeness, uniqueness, validity, consistency, timeliness) plus a verifier coda that encodes the failure modes the lab kept finding — no NULL hiding in an exclusion list, timestamps stored as unambiguous epoch-UTC rather than session-local, a cross-engine row count that actually agrees. On clean data it passes; on the deliberately faulted demonstrator the same checks light up every layer and the gate reads NOT READY, which is the whole idea, that the data earns trust by surviving the check rather than by being called trustworthy.

A data-health scorecard: security sources scored across three layers — source health, flow health, and data-quality dimensions — rolled to one composite per source, with the weakest sources (DNS resolver, Microsoft 365 audit) standing out in red.
The data-health check as a deliverable — every source scored across the quality dimensions named above, rolled to one composite (illustrative sample).

There's one more check in that repo, and it's the one that goes after the failure I opened with directly, because a data-quality gate can't. A quality gate only ever inspects the events that did land, so it can confirm the rows it sees are clean and still tell you nothing about a whole class that never arrived at all, and the rule that returned zero on a quiet-looking network was failing for exactly that reason. So the last piece is a flow-layer gate (flow_gate.py in the same repo, measured against the MOAR reference stack) that counts at every hop, reconciling the per-OCSF-class counts the source tap emitted against the counts that reached the model boundary. On a clean pipeline every class reconciles and it reads READY. Mis-map one class, the Authentication class 3002, to the wrong data model the way the silent error actually happens, and the gate reads NOT READY and names the exact gap, that class 3002 had 4,000 events emitted and 0 landed, four thousand events gone between the source and the model with nothing raised along the way, which is precisely the state a detection keyed on that class would inherit when it returns zero on a genuinely loud network. The honest framing is that this is a self-contained demonstrator of the counting discipline rather than a gate wired into a production pipeline, but the counting itself is runnable today, and it's the measurement that catches the silent class-drop the quality gate structurally cannot.

The reframe I'd leave you with is small and I think it's true. The work you've been calling detection-writing was data modeling all along, deciding what the things in your telemetry are and how they connect, and the reason a detection can compile clean and quietly match nothing for months is that one of those modeling decisions was wrong and nothing was set up to catch it. The vocabulary makes the work visible, and the gate makes the wrong decision loud instead of silent, which is the one thing the failure mode I opened with was missing. The ground was always there, and it's worth being able to see it.