Detection engineering · OT / ICS

Detecting the OT you can't parse.

An industrial environment runs dozens of protocols, and most of them will never get a deep parser written for them, because the vendor is defunct, the protocol is proprietary, or the device is too obscure to be worth anyone's quarter. So the question that decides whether you can monitor an OT network is not how good your parsers are. It is how much you can detect from behavior — timing, flow, who talks to whom — before a parser ever runs.

Reading time: about 12 minutes. This is a practitioner argument, not a benchmark — Tier B, grounded in building detection for an OT environment at a regulated utility and in the public protocol and regulatory record, not in a measured run against labeled industrial traffic. I'm deliberate about that boundary below, because the honest version of this claim is about the architecture, not a precision number.

The problem

Parsing doesn't scale to the long tail.

The deep-parsing OT vendors — Nozomi, Claroty, Dragos — are good at the protocols everyone has heard of, and they have earned that: writing a correct, safety-aware parser for Modbus on TCP 502, DNP3 on 20000, S7comm on 102, EtherNet/IP on 44818, BACnet on 47808, or the IEC 61850 stack is real engineering, and on those protocols deep parsing buys you precision that behavior alone can't. But an operating plant is not a tidy list of well-known protocols. It is a substation built across three decades, a vendor that went out of business in 2009 whose RTUs still run the breaker logic, a serial gateway someone wrapped in TCP, a building-automation bus nobody documented. The protocols that carry the highest safety consequence are frequently the ones with the worst parsing coverage, because the obscure and the proprietary are what a parser-first strategy structurally can't reach.

That is a structural mismatch, not a gap you close by writing more parsers. A parser is per-protocol and per-version, and the long tail is unbounded, so a strategy that can only see what it has parsed will always have a part of the network it is blind to — and in OT the blind part is disproportionately the old, weird, safety-critical part. Worse, the threats that matter most in industrial environments are the ones a protocol parser is least equipped to catch: a zero-day in a control protocol, or an adversary living off the land with legitimate engineering commands, both look protocol-valid. The parser confirms the packet is well-formed Modbus; it does not tell you that this engineering workstation has never, in months of baseline, written a register at three in the morning.

The approach

Behavior first, parser second.

The move is the same one the data-engineering world made years ago and the security world is still catching up to: most of what you need to detect lives in metadata — connection records, timing, volume, the shape of who talks to whom — not in the decoded payload. Behavioral and flow analysis over that metadata is a mature, well-understood pattern, and OT is, counterintuitively, the place it works best, because industrial traffic is the most regular traffic in any enterprise. A programmable logic controller is not a laptop; it polls on a fixed cycle, talks to a small and stable set of peers, and moves predictable volumes. Regularity is what makes a baseline tight, and a tight baseline is what makes an anomaly mean something.

You don't need a full parser to start. You need enough to recognize the protocol and then enough flow features to characterize the device. Lightweight header recognition gets you the first part cheaply — the first handful of bytes and the port are usually enough to say "this is Modbus" or "this is DNP3" or, at Layer 2, to catch an IEC 61850 GOOSE frame by its EtherType (0x88B8) and a Sampled Values frame by its (0x88BA), all without decoding a single application field. From there the device classifies itself by how it behaves:

Purdue level	behavioral signature (from flow alone)
Level 0-1 (PLC / RTU)	fast deterministic scans (ms-scale cycles), small steady payloads, 1-5 stable upstream peers, near-zero timing variance
Level 2 (SCADA / HMI)	polling on seconds, mixed payload sizes, tens of connections, human-driven bursts
Level 3+ (enterprise)	high variability, business-hours correlation, large transfers, many protocols

None of that requires understanding what the protocol says. It requires understanding how the device moves, and the regularity of OT makes those bands sharp enough to place a device on the Purdue model from its conn records alone. Deep parsing still earns its place — it adds precision where the protocol is known and the stakes justify the engineering, and on IEC 61850 MMS the open Zeek analyzers now parse most of the application layer. The argument is not parser versus behavior. It is sequence: behavior first, because it covers the whole environment from day one, and parsing second, layered in where it pays.

What the flow already tells you

Detections you get without decoding a payload.

Once devices are placed and baselined, a surprising amount of malicious behavior shows up as a flow anomaly, and most of it maps onto techniques already catalogued in MITRE D3FEND — the point being that these aren't novel inventions, they are established network-analysis detections recalibrated for the way OT actually behaves. A handful carry most of the weight. Communication that crosses Purdue levels it never crossed in baseline — an enterprise host suddenly speaking to a Level-1 controller — is a community-deviation signal that needs no payload. The direction of bytes is diagnostic on its own: an HMI is overwhelmingly a downloader and a field device overwhelmingly an uploader, so an inverted ratio, a controller that starts pushing data upstream, is visible purely in volume and direction. Payload size against the expected size for an operation catches covert channels hiding in oversized routine messages, again without knowing the operation's semantics, only its normal envelope.

The strongest of these exploits a property unique to OT: the networks are overwhelmingly machine-to-machine, so human activity is rare and therefore loud. An interactive session to a PLC, an engineering workstation reaching a controller outside a maintenance window, a remote-support tool waking up off schedule — in an IT network these drown in noise, but in a machine-dominated OT segment a human-command signal carries far more information per alert, which is why behavioral OT detection tends to run at a lower false-positive rate than naive protocol-metadata alerting rather than a higher one. You are alerting on the rare thing, not the common one.

The regulatory now

CIP-015 turns east-west visibility into a mandate.

There is a reason this matters right now and not in the abstract. FERC's Order 887 directed NERC to write an internal network security monitoring standard, and the result, CIP-015-1, requires monitoring inside the electronic security perimeter for high- and medium-impact bulk-electric-system cyber systems — east-west, within the trust zone, not just at the boundary. The driver is explicit in the record: perimeter defenses don't catch an adversary already inside moving laterally with valid credentials and legitimate-looking commands, which is precisely the living-off-the-land tradecraft that campaigns against critical infrastructure have made the central concern. INSM is the regulator saying, in effect, that you have to be able to see lateral movement inside the OT network.

A behavioral-first architecture is the fastest honest path to satisfying that, because internal visibility across a whole OT segment is exactly what flow analysis gives you on day one, and it gives it for the proprietary and undocumented devices a parser-first program would still be working toward at the compliance deadline. A utility standing up INSM can have meaningful east-west coverage from conn-shaped telemetry now and add deep parsing where it sharpens the picture, rather than gating its monitoring program on a parser backlog. The standard rewards breadth of visibility, and breadth is the behavioral approach's home turf.

What this is and isn't

The honest boundary on the claim.

I want to be careful here, because OT-detection writing is full of precision numbers that don't survive contact with their source. You'll see classification accuracies quoted to a decimal — "behavioral features classify devices at over 99 percent" — and those figures routinely have no named study or labeled-industrial-traffic ground truth behind them. The one I just quoted is illustrative; it was never measured against real OT captures, and I'm not going to repeat it as if it were. The claim I'm making is about the mechanism and the ordering, not a percentage: that behavioral-first gives you immediate coverage across the long tail where parser-first structurally cannot, that OT's regularity makes flow baselines unusually tight, and that the human-rare property pushes the false-positive rate down rather than up. Those are properties of how OT behaves, and they're what transfers. The exact accuracy of any given classifier is a measurement I'd want to run against ground-truth-labeled OT captures before I'd put a number on it, and the day I do, that number — not a borrowed one — is what goes here.

The approach has real limits, and naming them is the point rather than a disclaimer. Behavioral detection is less precise than deep parsing on a protocol you have parsed — it tells you a device is acting wrong, not which malformed function code did it. It needs enough baseline volume to learn normal, so a freshly instrumented segment is weak until it has watched a few cycles of legitimate operation. And a sufficiently careful adversary who stays inside every learned envelope — right peers, right timing, right volumes — can move under a purely behavioral detector, which is one reason the honest architecture layers parsing back in for the high-consequence protocols rather than treating behavior as the whole answer.

And I have an obvious incentive to want this argument to be true, because behavioral-first OT monitoring is the kind of work I do, so the guardrail has to be the falsifier stated out loud: the thing that would change my mind is a controlled test on labeled OT traffic where a parser-first program detects the same planted attacks at an equal or lower false-positive rate, across the long tail and not just the well-known protocols. I hold the behavioral-first position strongly, but it isn't settled, and that test is what would settle it.

From the field

This isn't a parser cop-out.

The behavioral-first argument is easy to mistake for an excuse made by someone who can't write parsers, so it's worth being concrete about where it comes from. I built OT detection for a regulated utility's environment, against IEC 62443's segmentation model and the Purdue levels the plant was actually wired to, and I've done the parser-level work too — my contribution to Palo Alto's PAN-OS app for Splunk (PR #294, public on GitHub) was a field-mapping and parsing fix, the unglamorous business of making a vendor's logs say what the schema claims they say. I bring that up because behavioral-first is a deliberate architecture from someone who has done the deep-parsing work, not a shortcut from someone avoiding it. Knowing exactly how expensive and per-protocol a correct parser is, is what makes the case for not gating your whole monitoring program on having one for everything.

This is the same bridge I keep arguing for across the rest of this work — security's hardest problems are usually data-engineering problems wearing a security costume, and the open, composable patterns the data world already built apply directly once you stop treating security telemetry as special. OT detection is that bridge at its sharpest: the flow record, the baseline, the anomaly score are ordinary data-engineering objects, and carrying them into an industrial network gets you a CIP-015-aligned monitoring program across protocols you will never parse, with deep parsing layered in exactly where it earns its cost. The point isn't that parsing is wrong. It's that you can see the OT you can't parse, and most networks have a lot of it.