Technology deep-dive

Pipeline-based detection in stream processing.

Pipeline-based detection means the detection rule fires while the event is still moving through the stream processor, rather than later, when an analyst or scheduler queries data that has already landed in a SIEM index or a lakehouse table. That single shift (detection at ingestion time rather than at query time) changes both the latency profile and the failure modes, because latency moves from minutes-to-hours down toward sub-second in the happy path, and the failure modes move from "did the query run on schedule" to "did the watermark advance correctly, and is the state store sized for the join history I'm asking it to remember."

Reading time: about 22 minutes. Evidence tier: B overall (streaming framework documentation, practitioner blog posts, vendor case studies). Cost numbers in this essay come from public vendor pricing applied to an illustrative 10 TB/day workload. Treat them as a starting hypothesis for your own modeling, not a quoted figure for any specific deployment.

TL;DR

If you only read one thing

Pipeline-based detection fires the rule while the event is still moving through the stream processor, not later, when an analyst or scheduler queries data already landed in a SIEM index or lake table.
Latency drops from minutes-to-hours toward sub-second. The happy-path number is the engine number, not the SOC number. End-to-end latency includes alert routing, triage queue, analyst working hours.
Failure modes change. Move from "did the query run on schedule?" to "did the watermark advance correctly, and is the state store sized for the join history I'm asking it to remember?"
Most common production failure: state-store growth when joins remember more history than the workload actually requires. The fix is a TTL on the state store, not more memory.
When pipeline detection isn't right: high-state joins across long windows, detection logic that needs to span multiple late-arriving sources, or programs whose analysts can't read streaming-SQL diagnostics yet. Hybrid (streaming for atomic, batch for stateful) is the common landing zone.

The structural shift

Query-time detection vs stream-time detection.

Traditional SIEM detection follows a four-step loop: ingest raw events into an indexed store, retain them for 90 to 365 days, run correlation rules against that store on a schedule (typically every five to fifteen minutes), and surface alerts. The model treats detection as a recurring query against stored data. You pay for the storage, and you pay again, in compute, every time a rule scans the indexed corpus.

Pipeline-based detection moves the rule one step earlier in the data flow. The detection logic runs inside the stream processor (RisingWave, Apache Flink, Spark Structured Streaming, Kafka Streams, Tenzir's pipeline runtime) before the event lands in the analytical store. The processor emits two things: a small signal stream (the detections themselves) and a sampled or routed copy of the raw events for downstream forensic use. The "scan the whole corpus every five minutes" step is gone, replaced by per-event evaluation as the event arrives.

The latency claim follows from that. A scheduled SIEM rule with a five-minute window has a structural floor: an event that arrives one second after the rule fires waits four minutes and fifty-nine seconds for the next pass. A stream operator evaluating per-event has no such floor; the rule fires when the event arrives. End-to-end latency in well-tuned production deployments may land in the sub-100ms range for stateless rules and the low-seconds range for short-window stateful rules. One caveat: "sub-100ms" is the engine’s firing latency, not end-to-end attacker-to-alert latency, which still includes collector lag, network transit, and notification fan-out. I name the engine number because it’s the one streaming vendors publish; the end-to-end number is the one a SOC actually feels.

For a deeper read on a specific engine built around this pattern, see the RisingWave streaming reference. It’s a PostgreSQL-compatible streaming database that writes continuous-query results to Iceberg without a separate buffering layer, which is one of the cleaner expressions of the pipeline-detection idea in 2026.

Jargon translation

The four streaming concepts a security architect has to know.

Stream processing has its own vocabulary, and the words matter because each one names a place a detection can go wrong. Four terms come up repeatedly when discussing pipeline-based detection, and I translate each one here in security-operational language rather than data-engineering shorthand.

Windowing

A window is a finite slice of an otherwise unbounded event stream. "Five failed logins in ten minutes" is a window-based rule; "ten minutes" is the window length. There are tumbling windows (non-overlapping fixed slices, like a clock ticking), sliding windows (overlapping slices that advance continuously), and session windows (slices defined by inactivity gaps in a user’s activity). For security, the window length is usually the dial that trades detection sensitivity against state size: longer windows catch slower-burn attacks but cost more memory.

Watermarks

A watermark is the stream processor’s assertion that "I have probably seen all events with timestamps up to time T." When the watermark crosses the closing edge of a window, the processor emits that window’s result. The tricky part is that events arrive out of order. An EDR agent in a coffee shop on flaky Wi-Fi may deliver a process-execution event ten minutes after it happened. If the watermark advanced too aggressively, that event arrives after its window closed and either gets dropped or gets routed to a "late events" side channel. Tuning watermarks is the part of pipeline detection that most operators underestimate.

Stateful operators

A stateless operator examines one event in isolation. "Is this DNS query for a domain whose entropy exceeds 3.5?" is stateless. A stateful operator has to remember things across events. "Has this user logged in from a different country in the last twenty-four hours?" requires the processor to remember the last login location per user. That memory lives in a state store (usually RocksDB for Flink, an internal columnar store for RisingWave, or an external Redis or Kafka topic for some Kafka Streams deployments). State size grows with cardinality (how many users, how many hosts) and window length (how far back you have to remember), and state explosion is the most common operational failure mode in long-window pipeline detection.

Incremental view maintenance

Incremental view maintenance (IVM) is the technique of keeping a derived view (an aggregation, a join, a materialized query result) up to date by applying only the deltas from new events, rather than re-running the whole query. RisingWave is built around IVM; Materialize is the standalone-IVM database that popularized the term. For security, the practical implication is that you can write a detection as a SQL query against a continuously maintained view, and the engine handles the "keep it current as new events arrive" part automatically. That moves a lot of complexity out of the application layer and into the engine. The trade-off is that IVM engines have specific shapes of query they handle efficiently and shapes they don’t, and learning where the line is takes time.

Engine landscape

The four engines worth naming.

Apache Flink

Flink is the long-established choice for stateful stream processing at high throughput. State lives in RocksDB on local disk with periodic checkpoints to object storage (S3 or equivalent). Flink’s strengths are mature exactly-once semantics, fine-grained control over operator state, and production track record at scale (Alibaba, Pinterest, Stripe, and many security-oriented teams run it). The cost is operational: Flink jobs are long-lived, the state backend needs sizing and tuning, and recovery from a failed checkpoint can take minutes on large state. For security, Flink fits well when you have data-engineering capacity and a need for short-to-medium-window stateful detection (sliding windows of one to sixty minutes).

Spark Structured Streaming

Spark Structured Streaming is micro-batch under the hood: the engine collects events into small batches (often hundreds of milliseconds to a few seconds) and runs the same Spark SQL engine across each batch. That micro-batch model means latency floors out higher than Flink or RisingWave, typically in the seconds range rather than sub-second. The advantage is operational continuity with existing Spark deployments: if your lakehouse already runs on Spark for batch ETL, Structured Streaming reuses the same code, the same cluster, and the same operational tooling. For security teams committed to Databricks or another Spark-centric platform, this is often the path of least resistance.

RisingWave

RisingWave is a PostgreSQL-compatible streaming database built around incremental view maintenance. Detection rules are written as SQL against materialized views; the engine keeps those views current as new events arrive. The architectural pitch is "stateless compute, all state in S3": operator state lives in object storage rather than local disk, which decouples compute scaling from state size. Production references named in the RisingWave reference architecture include Palo Alto Networks (Cortex XSIAM) and China Telecom Bestpay. The trade-off, as with any IVM engine, is that certain query shapes (recursive joins, very long windows) don’t map cleanly to incremental maintenance, and you have to know where those edges are. Sub-100ms event-to-firing latency is the published figure; I cite it as Tier B until I have run it against a security workload myself.

Kafka Streams

Kafka Streams is a Java library, not a separate cluster. It runs inside the application process and uses Kafka itself as the durable state store. The model is appealing when you already run Kafka heavily and don’t want a second piece of infrastructure for stream processing. The ceiling is real, though: very-high-throughput stateful workloads (hundreds of thousands of events per second with long-window joins) push past what Kafka Streams handles comfortably, and at that point you usually move to Flink. For security, Kafka Streams works well for stateless or short-window stateful detection on workloads up to roughly tens of thousands of events per second.

Detection shapes

What pipeline detection does well, and where it struggles.

Strong fit: stateless and short-window stateful detections

The detections that map cleanest to stream processing are the ones that can be evaluated per event or within a bounded recent window. DGA domain detection (domain entropy plus first-seen check) is stateless. Lateral movement on first-seen internal connections is short-window stateful (remember recent source-destination pairs). Impossible-travel authentication checks are short-window stateful with a per-user state slot. Credential-dumping process-execution detection is largely stateless pattern matching against process names, command lines, and parent-process context, with an optional threat-intel enrichment lookup. Ransomware behavioral detection ("more than 50 files modified with unusual extensions in 60 seconds") is a 60-second tumbling window per host.

For these shapes, pipeline detection earns its keep, because the detection fires in seconds rather than minutes, the signal stream stays small (kilobytes to low megabytes per day per rule), and the rules themselves are often more concise in stream SQL than in their SIEM correlation-language equivalents, since the engine handles windowing and state automatically.

Hard fit: long-window correlation, retroactive investigation, complex ML

Some detection patterns don’t fit stream processing well. Multi-day correlation (“account accessed from three or more countries within seven days”) creates a state explosion problem. The engine has to remember per-user activity for the full window, and at large user-cardinality this gets expensive fast. The honest pattern here is to do the short-window piece in the stream and the long-window piece as a batch query against the lakehouse, where the seven-day join is cheap because the data sits at rest in Iceberg or Delta.

Retroactive investigation (“was this account already compromised thirty days ago, and we just noticed today?”) is fundamentally incompatible with a pure pipeline model. The pipeline didn’t store the raw events thirty days ago; it stored signals and a sample. If you adopt pipeline detection without a complementary raw-data sample (typically 5–20% of events, landed in the lakehouse), you give up retroactive analysis. That trade-off is the single most common reason teams keep at least some raw-event storage even when they shift detection upstream.

Complex machine-learning detection (UEBA models with fifty-plus features computed over ninety-day windows) usually runs in the lakehouse, with the stream processor either feeding features into the model or applying a pre-trained lightweight model in-stream for fast scoring, so once ML enters the picture most of the deployments I've seen end up hybrid rather than pure-stream.

New failure modes

What breaks in pipeline detection that didn't break in query detection.

Late-arriving events

In a query-based SIEM, an event that arrives ten minutes late simply gets indexed when it arrives and shows up in the next scheduled query. In a stream processor with watermarks, a late event may arrive after its window has already closed and emitted a result. The engine has to decide whether to drop it, route it to a side channel for late handling, or trigger window reprocessing, and each of those choices carries its own consequences, the worst being a silent drop, where you lose a detection and have no audit trail. The configuration that catches this is "allowed lateness" plus a late-events dead-letter topic, sized for the worst-case agent backlog you expect (think laptops returning from a long-haul flight or a stadium event where Wi-Fi was saturated).

Watermark tuning

Watermarks are tunable. A conservative watermark waits longer before closing windows (fewer late events get dropped, but detection latency increases). An aggressive watermark closes windows faster (lower latency, but more late events). There’s no universal correct setting; it depends on the source-system characteristics. EDR agents on managed corporate laptops behave very differently from EDR agents on unmanaged contractor devices, and cloud audit logs from AWS CloudTrail have known publication delays measured in minutes that you have to budget for explicitly, which is why the watermark policy ends up being an operational dial you revisit rather than a configuration you set once and forget.

State-store sizing

State size is the operational variable that most often surprises teams new to stateful streaming. A rule like “remember every source IP that has connected to every destination IP in the last twenty-four hours, per host” can hold tens of millions to billions of state entries at enterprise scale. If the state backend is local RocksDB, that’s disk pressure on the Flink task manager. If it’s S3-backed (RisingWave), that’s object-storage cost and read latency. Either way, the size is a function of cardinality times window length, and getting it wrong shows up as either out-of-memory crashes (RocksDB) or runaway storage bills (S3).

Checkpoint failures and recovery time

Stream processors recover from failure by restoring state from the most recent checkpoint and replaying events since. The recovery window is bounded by checkpoint frequency and state size. For a Flink job with a hundred gigabytes of state and a one-minute checkpoint interval, recovery may take several minutes, during which detections aren’t firing, so that recovery window is a detection gap you have to engineer around, either with active-active deployment, with downstream alerting on checkpoint health, or with the explicit accept-the-gap policy that some teams adopt for non-critical detection paths, because “real-time detection” has its own concept of downtime and it isn’t zero.

Cost model

A 10 TB/day illustrative comparison.

The cost case for pipeline detection is real, but the headline figures published in vendor material need to be taken apart before they’re useful. I run a 10 TB/day illustrative scenario here using public list pricing, enough to show the shape of the trade-off, not the right number for any specific deployment. Negotiated SIEM pricing, retention policy, sampling strategy, and existing infrastructure reuse all move the number.

In the query-based pattern at Splunk-like list pricing, indexed storage (10 TB/day × 90 days at $3–10/GB/month) runs roughly $2–3M/month, plus $300–500K/month for correlation and ad-hoc query compute. Total $2.5–3.5M/month; three-year TCO around $90–125M before discount.

In the pipeline pattern, stream processing costs $50K–300K/month depending on commercial vs self-managed tooling (Tenzir, Cribl, RisingWave, self-hosted Flink). Storage shrinks because you’re landing signals (typically <1% of raw) plus a 5–20% sample into S3-backed Iceberg at $0.02–0.05/GB/month rather than indexed-SIEM pricing. Total $100K–500K/month; three-year TCO around $4–18M.

That’s a 10–25× cost reduction in the illustrative case, which is the basis for the "85–95% savings" claim in vendor material. The number is plausible on this workload shape, not guaranteed, and depends on accepting that you no longer hold 100% of raw events in indexed storage. If your compliance posture requires 100% raw retention in an indexed format, pipeline detection complements rather than replaces the SIEM and the savings are smaller.

For the costing details and the cross-currents on table format choice for the storage layer underneath, see the Iceberg vs Delta Lake essay. The lakehouse choice is the larger of the two cost levers in this architecture, and pipeline detection compounds it rather than substitutes for it.

Hybrid pattern

Three tiers, not a replacement.

The pattern I recommend in 2026 is hybrid (pipeline detection on top of a lakehouse on top of a small SIEM retention window) rather than pipeline detection as a wholesale SIEM replacement.

Tier 1: stream-processing detection for high-confidence, low-state-cost rules. Known-bad IOCs, signature matches, DGA domains, lateral-movement first-seen connections, ransomware behavioral patterns, PII exposure, geo-fenced policy violations. Signals land in the lakehouse for investigation. This is where sub-second to low-second detection latency actually shows up.

Tier 2: lakehouse batch and incremental queries for multi-event correlation over hours-to-days windows, UEBA model training, and retroactive investigation against sampled raw data. Latency is minutes-to-hours, acceptable because these queries drive threat-hunting and model retraining, not real-time alerts.

Tier 3: short-retention SIEM for the 7–30 day window where analyst-driven investigation is most active. The SIEM stays, but at one-third to one-tenth the original data volume, because high-volume low-value sources (DNS, proxy, firewall connection logs) have been routed past it. This is the lever that turns the SIEM bill from $2–3M/month into $300–500K/month at the illustrative 10 TB/day scale.

The three-tier model is what most of the case studies I’ve seen actually land on, because it preserves analyst workflow and long-tail investigation while still capturing the cost reduction. The pure-pipeline, no-SIEM architecture does exist in startups and in specific HMM2 leapfrog cases, but those are uncommon enough that I treat them as the exception against the three-tier default.

Detection examples

What a pipeline detection actually looks like.

Stateless: DGA domain detection

Domain-generation algorithms produce high-entropy, long, never-seen-before domains as malware callback addresses. The detection is per-event: compute the Shannon entropy of the domain string, check the length, check whether it’s on a known-domain allowlist. No state required beyond the allowlist (which itself is a slowly changing reference table the engine joins against).

-- Stateless DGA detection as a streaming SQL view
CREATE MATERIALIZED VIEW dga_signals AS
SELECT
  event_time,
  source_ip,
  domain,
  shannon_entropy(domain) AS entropy
FROM dns_queries
WHERE shannon_entropy(domain) > 3.5
  AND length(domain) > 20
  AND domain NOT IN (SELECT domain FROM allowlist);

On RisingWave or a similar IVM engine, this materialized view stays current automatically as new DNS events arrive. Detection latency is essentially the network and engine round-trip: tens to hundreds of milliseconds.

Short-window stateful: ransomware behavior

Ransomware shows up in EDR file-modification telemetry as a burst of unusual file activity: many files modified per minute, often renamed to unusual extensions, often with high-entropy content (encrypted). The detection is a 60-second tumbling window per host, with an aggregate on file count and a check on extension-set rarity.

-- Ransomware behavioral detection: 60-second tumbling window per host
CREATE MATERIALIZED VIEW ransomware_signals AS
SELECT
  host_id,
  window_start,
  COUNT(*) AS files_modified,
  COUNT(DISTINCT file_extension) AS distinct_extensions,
  COUNT(*) FILTER (WHERE file_extension IN
    ('.encrypted', '.locked', '.crypto')) AS ransomware_exts
FROM TUMBLE(file_modifications, event_time, INTERVAL '60' SECOND)
GROUP BY host_id, window_start
HAVING COUNT(*) > 50
   AND COUNT(*) FILTER (WHERE file_extension IN
     ('.encrypted', '.locked', '.crypto')) > 0;

State requirement: per host, the running file-modification count and extension set for the current 60-second window. At 10,000 hosts that’s tens of thousands of small state entries, comfortable for any of the engines named above. Detection latency floors at the window length plus emission delay, so call it 60–65 seconds end to end. That’s slower than the DGA example, but it’s the right floor for this detection shape because the pattern itself requires observing accumulated behavior, not a single event.

Long-window stateful: impossible travel

Impossible-travel detection compares each authentication event against the last-known location for the same user, computes geographic distance and elapsed time, and checks whether the implied speed exceeds a threshold (around 1,000 km/h is the canonical "faster than commercial aviation" cut). The state requirement is one slot per user, holding last login location and timestamp. State cardinality equals the user population, which for most enterprises is bounded in the tens of thousands; again, comfortable for any of the engines. The state TTL (typically 24 hours) prevents unbounded growth on inactive users. This pattern sits at the edge of what I’d call short-window stateful; the 24-hour TTL is long by streaming standards but small by SIEM correlation standards, and that gap is roughly where pipeline detection starts to need help from the lakehouse.

Migration shape

Start with one high-volume source, expand from there.

The migration pattern I’ve seen work is to start with the single highest-volume, lowest-information-density source (almost always DNS query logs) and migrate just that source first. DNS is typically 30–50% of total security log volume with relatively low per-event value, so the cost reduction is large and the analyst-workflow disruption is small. Stand up a stream processor, implement five to ten detections (DGA, malicious-IP blocklists, suspicious TLDs, geo-fencing), route signals to the lakehouse, keep a 5–10% raw sample, and stop indexing the rest, which is roughly three months of work that produces measurable savings without forcing any analyst retraining.

Expansion order from there is roughly proxy and firewall connection logs, then EDR process-execution logs, then authentication logs, then cloud audit trails. Multi-event correlation and UEBA move last, because they need the lakehouse layer in place.

Full migrations run 6–12 months for a mid-size deployment and 18–24 months at Fortune-500 scale. The cases that fail are the ones that tried to migrate all sources at once, or underestimated the watermark- and state-tuning operational work in the first three months. The learning curve is real, and pretending it isn’t is the most expensive mistake I see.

Honest gaps

What I won't claim yet.

The gaps in my own evidence base that I’d flag for any architect making a decision against this essay:

Sub-100ms is the engine number, not the SOC number. Published streaming-engine latency measures the firing-to-result path inside the engine. End-to-end latency from attacker action to analyst notification still includes collector lag, network transit, signal-routing delay, and notification fan-out. The pipeline pattern shrinks the largest component (the scheduled query interval), but the residual end-to-end number may still land in the multi-second range.
Cost numbers are illustrative, not benchmarked. The 10× cost-reduction figure is a list-price model on a 10 TB/day workload. Negotiated SIEM pricing, existing infrastructure reuse, retention-policy specifics, and sampling strategy all move the number. The shape of the trade-off is real; the specific multiple is workload-dependent.
OCSF-shaped event streaming hasn’t been independently benchmarked. The streaming engines I name above publish benchmarks on TPC-H or synthetic workloads, not on OCSF security event shapes specifically. Performance on real EDR or CloudTrail event volumes may differ. Benchmarking against a representative production workload is a real piece of evaluation work that shouldn’t be skipped.
Vendor-published case studies are vendor-published. The Tenzir, Cribl, and RisingWave references I’ve seen are largely Tier B (vendor blog posts, conference talks with customer logos). They’re consistent with the structural argument, but independent third-party validation at production scale (with methodology disclosed) is still relatively thin.
Late-event handling under attack conditions is under-discussed. Watermark tuning and late-event policies get attention in normal-operations posts. What happens under DDoS conditions when collector backlogs spike, or under an active adversary intentionally delaying telemetry, is a harder operational question that I haven’t seen published cleanly.

None of these gaps invalidate the pattern, since they’re the questions a security architect should ask before committing, and the questions a vendor selling pipeline-detection tooling should be prepared to answer.

Conclusion

Detection moves upstream; the failure modes move with it.

Pipeline-based detection is the structural answer to the “store everything, query everything repeatedly” cost trap that defined SIEM economics for two decades. The detection moves to the stream processor, the latency drops by an order of magnitude or more for the rules that fit the stream model, and the storage cost drops by a similar factor because the volume that lands in the indexed store collapses.

The honest version of the pitch is that pipeline detection doesn’t make detection simpler, because it moves the complexity from query scheduling and storage cost management into watermark tuning, state-store sizing, and late-event handling, and the teams that adopt it well are the ones that treat that operational shift as a real piece of work rather than as the free lunch the vendor pitch implies.

The engines that matter in 2026 are Apache Flink (mature stateful processing, operational weight), Spark Structured Streaming (continuity with Spark-centric lakehouses, micro-batch latency floor), RisingWave (PostgreSQL-compatible IVM, stateless-compute architecture, Iceberg-native output), and Kafka Streams (library model, Kafka-resident state, good for short-window stateful work at moderate throughput). None is universally best; the right choice depends on existing infrastructure, team familiarity, and the detection shapes that dominate the workload.

My recommendation: start with one high-volume low-value source (DNS), pick the engine that matches your team’s existing skills, run the hybrid three-tier architecture (pipeline plus lakehouse plus shrunken SIEM), and measure the latency and cost numbers on your own workload before quoting any of the published headline figures externally, because the pattern itself holds up while the savings stay workload-dependent, and the operational shift is the part that most evaluations underestimate.