Technology deep-dive
Sigma 2.0 correlations and the pySigma backend reality.
I wrote a foundational piece on Sigma as the fourth open standard in the security lakehouse and deliberately left the correlation question hedged. This essay does the unhedged version. I name the four Sigma 2.0 correlation types, walk through the pySigma backend ecosystem with its community-published maturity states, and describe what conversion quality looks like when you compile real rules to real engines. The correlation extension is improving but lags vendor-native correlation engines materially, and the backend conversion story is "useful per backend, with caveats" rather than "write once, deploy everywhere." I also compiled a rule set to four backends, reported below, to put a first-party measurement under the per-backend claims rather than leaning only on the community maturity signal.
Reading time: about 19 minutes. Evidence tier: B overall (Sigma specification documents, the pySigma plugin directory's published state field, vendor product pages I can verify) with Tier A on the correlation specification itself, and a first-party compilation benchmark (reproducible, pinned versions) added below. Flagged inline.
The starting point
Where Sigma sat before 2.0.
The original Sigma specification (the one Florian Roth and Thomas Patzke published in 2017) handled
atomic detections cleanly and almost nothing else. A Sigma rule described log-source selection criteria,
field-match conditions on a single event, and metadata for ATT&CK tagging. There was a pipe-syntax
"aggregations" affordance in the condition (constructs like | count() by user > 5) that
hinted at multi-event detection, but it was vendor-implementation-defined and not portable across
pySigma targets in any reliable sense.
The Sigma Correlation Rules Specification (current published version 2.1.0, dated August 2025 per the SigmaHQ specification site) replaces the pipe-syntax aggregations with a structured YAML correlation block, defines four correlation types, and gives pySigma backends a defined surface to compile against. The spec is the Tier A artifact in this essay; what follows about backend support is Tier B.
The four correlation types
What Sigma 2.0 actually adds.
The correlation specification defines four types, each with a structured YAML shape, a required
timespan, optional group-by fields, and (for the count types) a
condition comparison operator. The four types are:
1. event_count
Tallies how many times a referenced atomic Sigma rule fires inside a defined time window, grouped by
whatever fields you specify. This is the "five failed logins in sixty seconds" idiom. The YAML references
one or more underlying Sigma rules by ID, sets a timespan (the spec accepts compact strings
like 60s, 5m, 1h, 1d), defines a group-by
field set (typically a user, source IP, or host identifier), and applies a comparison operator from
gt, gte, lt, lte, eq, neq
against a threshold count.
title: Failed-login burst from same source
id: 9d0e8e0a-...-correlation
correlation:
type: event_count
rules:
- failed_login_event
group-by:
- SourceIP
timespan: 60s
condition:
gte: 5 2. value_count
Counts distinct values in a specified field within the timespan, grouped by the group-by set. The
canonical use is enumeration detection: one source IP attempting authentication against many distinct
usernames in a short window, or one host resolving an unusually large number of distinct domains. The shape
is the same as event_count with an additional field attribute naming the field
whose distinct values are being counted.
3. temporal
Fires when all referenced rules match within the same timespan and group-by scope, regardless of order. This expresses "any combination of these N indicators inside this window, attributable to the same entity," useful for kill-chain-style detections where ordering is unknown or noisy. No threshold condition is required; the assertion is set-membership across the rules list.
4. temporal_ordered
Same as temporal with the additional constraint that the referenced rule matches must occur in
the listed order. This is the kill-chain progression idiom: initial access, followed by discovery, followed
by lateral movement, attributable to the same entity, inside a defined window. It's the most expressive of
the four types and also, predictably, the one with the most variance in backend support quality.
Two structural notes that matter for portability. Correlation rules reference atomic rules by ID, which means your atomic rule library is the base layer and your correlation rules are the analytic layer above it. Correlations can also be chained (a correlation rule can reference another correlation rule's ID) which gives composability for higher-order detections. That second property is powerful in principle and sparsely supported in practice; chained correlations are where backend coverage starts to fragment.
The spec covers the four idioms that compose most multi-event SOC detections. What it does not cover is statistical baselining ("this account's login rate is three standard deviations above its trailing thirty-day baseline"), graph-shaped entity correlation, or join-heavy multi-source enrichment beyond the field-equality group-by mechanism. Those remain vendor-native territory, and an architect committing to Sigma needs to know where the abstraction ends.
The strongest correlation hedge
Specification parity is not implementation parity.
The most important thing to internalize about the Sigma 2.0 correlation work is that the specification describes a clean, four-type, structured-YAML correlation surface, while the implementation of that surface across pySigma backends is uneven. As of mid-2026 (speaking from the publicly-visible state of the SigmaHQ plugin directory and backend issue trackers, which is Tier B evidence) some backends compile all four correlation types, some compile event_count and value_count but not the temporal types, and a handful flag correlation as an open issue rather than a shipped feature. The Grafana Loki backend, for example, has had correlation as an open GitHub issue for the LogQL target. The state changes month to month, so any specific backend-feature claim needs verification against the current pySigma release before you stake an architecture decision on it. I put a first-party measurement under that directory signal below, compiling a rule set to four of these backends and reading what each one actually emits.
The honest claim I am willing to make: vendor-native correlation engines (Splunk Enterprise Security correlation searches, Microsoft Sentinel analytics rules, Google Chronicle multi-event rules) are still materially ahead of the Sigma correlation extension across all four types, when you measure not just "does it compile" but "does it compile efficiently, with predictable performance, with alerting metadata preserved, with the operational tooling SOCs need." The specification gap is closing, the implementation gap is closing more slowly, and the operational gap (production tooling around the detections) is closing slowest of the three.
That is the central hedge of this essay, and the version I'd want an architect to carry away is that Sigma 2.0 correlation is a useful and rapidly improving capability while it is not yet a drop-in replacement for vendor-native correlation engines for the highest-value multi-event detections in a production SOC, so the planning has to hold both of those things at once.
I want to name the shape of this asymmetry, because it is the same shape that shows up one layer down in the security lakehouse. Sigma's atomic rules port while Sigma's correlations largely don't, which is a read-strong, write-weak split: the atomic layer reads cleanly into nearly every backend, while the higher-semantics layer (the sequenced, stateful, cross-event logic) fails to carry across. The open security catalog has the identical split, because an open table format gives you schema reads that work across most engines, and the security-grade write controls and richer semantics (lineage, the operational guarantees a SOC actually depends on) are the part that doesn't port. It is the same asymmetry across two different standards, so when I argue that open standards buy you portability on the read side and leave the write side unsolved, Sigma 2.0 is the cleanest example of the pattern I have.
First-party measurement
What compiling the rules actually shows.
The backend-support claims above come from the plugin directory's state field, which is the community's maturity signal rather than a measurement, so I compiled a small rule set and looked at what each backend actually emits. Eleven Sigma rules — six single-event detections and five correlation rules covering all four types — compiled to four open backends (Splunk SPL, Elasticsearch ES|QL, Elasticsearch Lucene, and OpenSearch PPL) on pySigma 1.3.3, with the compiler run twice and asserted byte-identical so the result reproduces. The harness and the verbatim queries are public.
The single-event rules ported everywhere, six of six on all four backends, which is the clean-conversion result the next section describes, confirmed rather than asserted. The correlation rules are where the four backends separated, and they separated in three different ways.
Elasticsearch Lucene refused all five, raising NotImplementedError: Backend does not support
correlation rules. A filter-only query language cannot express aggregation, so the backend raises
rather than emit a query that means less than the rule intended, and that is the safe way to fail,
because you find out at compile time that the rule did not translate. Splunk SPL and ES|QL sat at the other
end: both preserved the count, distinct-count, and unordered-temporal correlations in full, with the
aggregation, the threshold, and the time window all present in the generated query, and both refused
temporal_ordered, the ordered-sequence type, which neither implements at these versions. So the
read above on ES|QL as one of the stronger correlation targets holds up under measurement, with the ceiling
at ordered sequencing.
OpenSearch PPL is the one that should make you careful, because it translated every correlation type, including the ordered sequence the other two refuse, and that breadth hides a loss: on the count-based correlations, the brute-force and spray detections, it drops the time window. The same brute-force rule compiles two ways:
# Splunk SPL — the five-minute window is in the query
... | bin _time span=5m
| stats count as event_count by _time TargetUserName
| search event_count >= 10
# OpenSearch PPL — same rule, no window
... | stats count() as event_count by TargetUserName
| where event_count >= 10
The SPL query buckets into five-minute windows and counts within each; the PPL query counts across the
whole search range, so a threshold of ten failed logons meant to fire on ten-in-five-minutes instead fires
on ten-ever. The query is syntactically valid, it runs, and it looks like a working brute-force detection,
and nothing errors, which is what makes the loss easy to miss. PPL keeps the window on the temporal rules,
where it emits span(@timestamp, 5m), so the drop is specific to its count path rather than a
blanket limitation, which is exactly the per-construct unevenness a single directory state field cannot show
you. Across the five correlation rules that is three brute-force and spray detections compiling to
queries that lost their time-bounding with no warning.
The boundary on this measurement matters: it is what the compiler emits, not what the SIEM executes, so a window absent from the query might be supplied by a dashboard time range or a scheduled-search interval, and a present construct is not proof of correct execution. But the emitted query is the artifact a practitioner copies into a SOC, so a dropped window is one someone has to remember to add back. The checks are disclosed and every verbatim query is recorded in the methodology, and a newer backend release can move any of these cells, so the honest use of the result is to pin the versions and re-run it, which the harness makes a single command.
The backend ecosystem
What pySigma compiles to in 2026.
The pySigma project publishes a plugin directory (pySigma-plugin-directory on the SigmaHQ
GitHub organization) that lists every recognized backend with a community-assigned state. The states are
defined in the directory's README as stable (working state, maintained),
testing (working but unfinished), devel (under development),
broken (dysfunctional but maintained), and orphaned (unmaintained).
Treating that field as the closest thing to a community maturity signal (Tier B, with the obvious caveat
that "stable" doesn't mean "feature-complete for correlations") here is the matrix I'd use to plan a
pilot.
Stable backends I'd pilot first
- Splunk (SPL and tstats). The flagship stable backend. Compiles atomic rules to both plain SPL and tstats data-model queries, with savedsearches.conf output for deployment. The most production-mileage of any pySigma target. The correlation story here is the closest to parity with the vendor-native equivalent, because SPL has the most expressive query language to compile into.
- Microsoft Kusto (KQL). Marked stable. Covers Microsoft Sentinel, Microsoft 365 Defender (XDR Advanced Hunting), and Azure Data Explorer. The Sentinel ASIM mapping is part of the toolchain. A credible target for atomic-detection portability. Correlation feature support in the KQL backend continues to improve and varies by release; verify against the current package version.
- Elasticsearch (Lucene, ES|QL, EQL). Stable. The ES|QL output is the most strategically interesting because Elastic's piped-query language is closer in shape to Sigma's correlation model than legacy Lucene. The backend has been updated to include ES|QL correlation support, which makes it one of the stronger correlation-extension targets I'm tracking.
- Loki (LogQL). Maintained by Grafana, marked stable for atomic rules. The correlation support has been a tracked open issue on the project's GitHub. Treat the atomic-rule coverage as production-credible and the correlation coverage as not-yet-production for this backend specifically.
- IBM QRadar (AQL). Stable. Compiles to AQL queries. Useful for organizations with QRadar-anchored SOCs that want to author detections in a portable layer.
- Palo Alto Cortex XDR (XQL). Stable. Compiles to XQL queries against Cortex XDR.
- Carbon Black (queries for EDR), SentinelOne (Deep Visibility, PowerQuery), Rapid7 InsightIDR (LEQL). All marked stable in the plugin directory. These EDR-focused backends matter because they extend the Sigma authoring surface into endpoint detection content where the vendor-native query languages are less standardized than SPL or KQL: a useful place for an open IR.
- OpenSearch (Lucene). Stable, with a maintenance status that tracks the Elasticsearch backend. Important for organizations on AWS OpenSearch Service.
- Panther. A pySigma backend named for the Panther detection-as-code platform exists in the plugin directory marked stable, which is a useful signal that the vendor takes the Sigma IR seriously enough to support a compilation target.
- Logpoint. Stable. Logpoint has publicly described their pySigma backend in product marketing.
Testing-state backends: useful but verify
-
PowerShell, HAWK.io, Datadog Cloud SIEM, SQLite (and Zircolite), ClickHouse (clicksiem's
pysigma-backend-clickhouse, added since my May check), NetWitness, SurrealQL, Golang Expr. All listed in the testing state. "Working but unfinished" per the directory's own definition. For pilot environments these are interesting; for production-critical detection paths I'd verify the specific feature coverage before committing, especially for the correlation extension.
Development-state backends: exploratory only
- Google SecOps (formerly Chronicle): UDM search and YARA-L 2.0. This is the one that surprises people. Chronicle is one of the most commonly cited Sigma targets in vendor materials, but the official pySigma SecOps backend is listed in the devel state in the plugin directory as of mid-2026. That doesn't mean it's unusable; it means the community maturity signal is "under development, not stable." If you're standardizing on Chronicle and Sigma, verify the current state of the backend against your specific rule set before committing to a pure pySigma path. Vendor-supplied or partner-supplied conversion paths may be more mature than the open pySigma backend.
- Trellix Helix, Quickwit, STIX, Azure Log Analytics (ala-socprime), OSSEM pipeline. Development-state. Treat as experimental.
Conspicuously absent
StarRocks, DuckDB, and a dedicated Trino or Presto backend remain absent: no pySigma backends in the
SigmaHQ plugin directory for any of them as of my recheck in June 2026, so the canonical compilation path
an Iceberg-anchored program would want still isn't there. Two narrow things did move since my May check,
though. ClickHouse now has a community backend in the directory, clicksiem's
pysigma-backend-clickhouse (on PyPI at 1.0.0), carrying the testing state; it compiles
atomic rules to ClickHouse SQL, but I haven't found documented Sigma-correlation support, so for a
correlations essay it sits in the same atomic-yes, correlation-unproven place as most testing-state
backends. And there is a single-author pySigma-backend-athena under the SigmaHQ org, not
listed in the plugin directory, that emits Presto/Trino-compatible SQL and, alone among the backends I've
read, transpiles the event_count correlation into a real SQL window function. The author
notes it likely works against any Trino engine, but it is a one-contributor, 1-star repo, only
event_count is implemented (value_count, temporal, and
temporal_ordered raise NotImplementedError), and the Trino compatibility is
untested, so I read it as a proof-of-concept worth watching rather than a backend to standardize on. For
lakehouse-anchored detection the practical workarounds are still (a) compile to the dictquery or sqlite
backends and adapt the SQL output, (b) use the Elasticsearch ES|QL output as a piped-query starting
point, or (c) maintain native SQL for multi-event correlation and use Sigma for the atomic layer. None of
these are clean.
The thinness of mature pySigma backends for the lakehouse engines I'd otherwise recommend is still the most consequential gap in the Sigma portability story for an Iceberg-anchored security architecture, though it is now a narrowing gap rather than an empty one: a testing-state ClickHouse backend and an event_count-only Athena/Presto proof-of-concept are early, partial answers that stop well short of the maintained, full-correlation backends the architecture actually wants. Worth tracking as a 2026–2027 development frontier, and a real reason to pair atomic detections in Sigma with engine-native authoring on the correlation tier for now.
Conversion realities
What works, what's lossy, what needs review.
I want to name the three honest categories of pySigma backend conversion, because the "write once, deploy everywhere" framing collapses all three into a single misleading sentence.
Clean conversions
Field-equality matches, simple AND/OR conditions, wildcard string contains, list memberships, basic NOT clauses. These are the bread-and-butter atomic-rule constructs and they compile to every stable backend without surprise. If your detection portfolio is mostly this shape (a surprisingly large fraction of any real SOC's detections are), Sigma is genuinely "write once, compile to whichever engine."
Lossy conversions — work, with rough edges
Regular-expression matches, case-insensitive comparisons in case-sensitive engines, field-extraction assumptions that differ between schemas, timestamp-window semantics in correlation rules. These compile, but the compiled query may not behave identically to a hand-written rule. A regex that runs efficiently in SPL may run poorly in LogQL. Backend pipelines (the per-environment mapping layer in pySigma) paper over much of this, but building and maintaining the pipeline is real engineering effort that the "write once" framing hides.
A useful outside confirmation of where the write side is thin comes from Tenzir, who are not Sigma detractors. In their public writing on running Sigma against their engine, they concede that Sigma's value-typing is weak. IPs, CIDR ranges, and timestamps get treated string-first rather than as typed values, and the taxonomy and field-mapping layer is underspecified. Their answer is not to replace Sigma but to lean on OCSF for the typed schema and to push the field mapping into pipeline mapping operators, so Sigma is left to express the detection logic on top of a normalized schema. That is the same write-weak diagnosis from a vendor with no incentive to overstate it, because the atomic match logic carries while the typing and mapping semantics do not, so you close the gap with an external schema layer rather than expecting Sigma to carry it.
Lossy enough to require manual review
Chained correlations, temporal ordering with more than two or three sequenced rules, value_count thresholds with high-cardinality group-by sets, detections that rely on vendor-specific lookup tables or enrichment macros. These either don't compile, compile to inefficient queries, or compile to something that runs but doesn't mean what the author intended. For these, treat the pySigma output as a draft, hand-review against the vendor-native target, and run regression tests against historical telemetry before deploying.
A Sigma-anchored detection-as-code program needs three things the marketing framing doesn't mention: a backend pipeline maintained per environment, a regression test harness that runs compiled queries against historical telemetry per backend, and a human review step for correlation rules above a complexity threshold. The CI/CD pattern below assumes all three.
The SigmaHQ rule repo
Bad rules, good defaults — and a real QA pipeline.
The SigmaHQ public rule repository is the largest open detection ruleset in security, with thousands of community-contributed YAML rules organized by log source and tagged against MITRE ATT&CK techniques. It also drives the loudest criticism of Sigma. Community contribution at scale produces a long tail of low-quality rules that an organization can easily import wholesale and regret. The honest version: many community rules are noisy, many encode environment-specific assumptions that don't generalize, and many are stale. That is true. It is also true for every open detection ruleset that has ever existed.
Nasreddine Bencherchali, one of the SigmaHQ maintainers, has written publicly about the project's quality assurance pipeline. Every pull request runs through validators, then a "good-log test" against the SigmaHQ evtx-baseline repository that flags rules generating false positives against known-good telemetry, then a regression test that requires submitters to contribute a sample malicious log validating the rule's usefulness. Rules that fail any gate do not merge. That is genuinely more rigor than I've seen in any other open detection community.
The "bad rules, good defaults" framing I land on: the repo has rules of widely varying quality, and the repo as a whole encodes good detection defaults that an organization can lean on rather than reinvent. The discipline you need on top of the repo is curation: pick the rules that match your environment, your log sources, your tolerance for false positives, not blind import.
The other underappreciated property: ATT&CK technique tagging is consistent and machine-readable. An architect can mechanically compute coverage maps from compiled rule sets, identify under-covered techniques, and drive detection-engineering priority from coverage gaps. That mechanical-mapping property is worth more than the marginal quality of any individual community rule, and it's the foundation of the work I describe in detection maturity.
It is worth being precise about how cross-SIEM coverage actually gets solved in practice today, because the marketing framing implies a translation engine that doesn't exist, and the working pattern is aggregation rather than auto-translation. Look at Michael Haag's Security Detections MCP, which aggregates more than 8,200 community detections, where the method is to gather content per SIEM and index the whole thing by MITRE ATT&CK technique rather than to take one canonical rule and auto-translate it across every backend. Coverage comes from curating a rule per platform and tying them together through the ATT&CK tag, with the technique ID as the join key rather than a compiler, and that distinction matters for an architect because per-backend curation plus ATT&CK indexing is the pattern that works at scale right now. Atomic auto-translation through pySigma works for the bread-and-butter shapes, but correlation auto-translation is the part that breaks down, which is exactly why the corpus is organized around aggregation rather than a single portable source rule.
Detection-as-Code with Sigma
The CI/CD pattern that makes Sigma pay off.
The thing that makes a Sigma-anchored program work in production is the CI/CD discipline around it, because Sigma as a YAML format with no pipeline is a YAML repo that drifts out of sync with what's deployed, while Sigma with a real detection-as-code pipeline is a portable detection program that survives engine swaps. The pattern that I see working in production looks like this.
1. YAML in Git, with mandatory metadata
Every Sigma rule lives in Git with mandatory frontmatter: ATT&CK technique IDs, log source, author, status, modification date, false-positive documentation. Pull requests are the only path to change a rule. Code review is mandatory.
2. Lint on every PR
The Sigma CLI ships a linter that validates YAML structure and rule conventions. Add it as a required CI check. Use the SigmaHQ rule-convention document as the baseline; extend with organization-specific requirements (mandatory false-positive section, mandatory data-source tagging in your local taxonomy).
3. Compile to each backend on every PR
A pySigma compilation job runs against each target (Splunk SPL, KQL, ES|QL, whichever you deploy) and fails the build if any rule fails to compile. The compilation output is preserved as a build artifact, giving you a per-backend audit trail of exactly what query was deployed for each rule version.
4. Regression-test against historical telemetry
For each compiled rule, run against a fixed window of historical telemetry per backend and capture two metrics: hit count (vs the previous rule version, to flag unexpected drift) and known-good false-positive rate (vs a curated baseline corpus). Fail the build on regressions beyond thresholds you define.
5. Deploy via backend-specific automation
On merge, deployment jobs push compiled artifacts to each target: savedsearches.conf to Splunk, ARM template to Sentinel, REST API to Chronicle or Panther, file-based deployment to a Loki ruler. The artifact filename or rule ID embeds the Git commit hash so production deployments are traceable back to the YAML source.
That pattern is what makes the Sigma investment compound. Without it, the YAML becomes documentation of what the SOC intended to deploy, not what's actually deployed. With it, source of truth and deployed artifact stay in sync, and a SIEM swap is a matter of adding a new compilation target rather than rewriting hundreds of rules. This is the operational pattern at DetectFlow, where Sigma supplies the format and DetectFlow supplies the discipline that keeps it in sync.
The trade-off
Vendor-native versus Sigma — when to choose which.
I keep landing on a hybrid recommendation in this essay because the honest answer is that the choice is per-rule, not per-program. Here is the heuristic I use.
Author in Sigma when
- The rule is atomic, a single-event match against known indicators or fingerprints.
- The rule is going to be deployed to more than one engine (a primary SIEM plus a long-tail archive, a cloud SIEM plus an on-prem SIEM, an EDR query language plus a SIEM correlation).
- The rule is a candidate for community sharing or community sourcing, because your organization wants to contribute it upstream, or you're sourcing it from SigmaHQ in the first place.
- The correlation pattern is event_count or value_count and the target backend has stable correlation support for the type you need.
- The detection lifecycle expects engine churn, whether a SIEM migration in the next eighteen months, a planned lakehouse adoption, or a dual-vendor procurement cycle.
Keep vendor-native when
- The detection relies on statistical baselining, anomaly scoring, or graph-shaped correlation that the Sigma correlation extension does not express.
- The detection uses vendor-specific enrichment, lookup macros, or threat-intel join patterns that the IR doesn't have a clean idiom for.
- The detection is a high-value temporal_ordered correlation with three or more sequenced rules and the target backend's coverage of that idiom is testing-state or worse.
- The detection lives entirely inside one engine and the engine isn't moving in the foreseeable planning horizon.
- The performance characteristics of the compiled Sigma output don't meet the SOC's response-time requirements, and the vendor-native equivalent runs materially faster.
The choice is rarely "go all-in on Sigma" or "stay on vendor-native," because the two will coexist in most production SOCs for the foreseeable future, and the boundary between them moves over time as the correlation extension matures and as backend coverage firms up.
Anvilogic and Panther (the two commercial products I named in the foundational piece) both lean into this hybrid framing in their public product documentation. Anvilogic exposes a Sigma-compatible authoring surface alongside vendor-native authoring paths and compiles to multiple SIEM backends. Panther maintains its own pySigma backend in the SigmaHQ plugin directory (stable state) alongside Python-based rule authoring. Neither claims pure Sigma implementation. Tier B evidence: I've read the vendor materials and verified the Panther backend in the plugin directory; I have not run either product in production.
What I'm watching
2026 development frontiers.
Three threads will move the Sigma correlation and backend story through 2026 and into 2027.
Lakehouse-engine backend maturity. A real, maintained pySigma backend for StarRocks, Trino, or DuckDB would close the largest portability gap in the Sigma story for Iceberg-anchored architectures, and ClickHouse is one step further along — it now carries a testing-state community backend in the directory rather than nothing. None of them is a stable, full-correlation backend yet, and the single-author Athena proof-of-concept that transpiles event_count into a SQL window function shows both the appetite and the distance left: it covers one of the four correlation types and isn't in the plugin directory. A community contribution or vendor sponsorship that lands a maintained, full-correlation backend for one of these engines in 2026 changes the calculus materially. This is the single development I'd most like to see, and the one I'd weight most heavily in any architectural decision about which detection-engine substrate to standardize on.
Correlation-extension implementation parity. The specification is done; the implementation is uneven across backends. The honest version: I am tracking the per-backend correlation coverage as a quarterly review item, not as a settled question. If the major stable backends (Splunk, KQL, ES|QL) converge on full event_count, value_count, temporal, and temporal_ordered support with predictable performance characteristics, the "vendor-native correlation engines are still ahead" caveat in this essay weakens. If they don't, the caveat persists.
SecOps backend graduation. The Google SecOps (Chronicle) backend graduating from devel to stable would matter for any organization on Chronicle. It would also be a useful signal about Google's investment in the Sigma IR as a first-class authoring surface for Chronicle. Track this as a 2026 watch item.
On the LLM-assisted authoring angle: there is real work happening on using language models to translate vendor-native rules into Sigma and to generate Sigma rules from threat-intel descriptions. I am not yet comfortable endorsing any specific tool, because quality varies widely and the evaluation methodology I'd trust doesn't exist yet, but it's worth tracking alongside the work in LLM-assisted OCSF mapping.
What to do this quarter
A ninety-day plan for an architect evaluating Sigma 2.0.
Four moves, ordered from cheapest to highest-investment, that test whether Sigma 2.0 plus pySigma fits your environment without committing to a wholesale migration.
- Audit your detection portfolio by shape. Categorize every active detection as atomic, event_count, value_count, temporal, temporal_ordered, statistical-baselining, or graph-correlation. The first five are Sigma's surface; the last two are not. Bucket sizes tell you the theoretical Sigma coverage before you compile a single rule. If 70% of your detections are atomic and event_count, Sigma is a high-leverage adoption. If 70% are statistical-baselining, it is not.
- Pilot pySigma on ten atomic rules across two backends. Pick two engines you already maintain duplicate detections in (commonly Splunk plus one other). Pick ten stable, important atomic rules. Author them in Sigma. Compile to both. Compare false-positive and false-negative parity against your hand-maintained versions over a two-week window. The signal is conversion quality on your rules, not on demo rules.
- Test one event_count correlation per backend. Author a single event_count correlation you can validate against ground truth, such as a known failed-login burst or a known enumeration attempt. Compile, run, and compare against the vendor-native equivalent on the same telemetry window, since that is the smallest test that tells you whether the correlation extension is production-credible for your backend version.
- Stand up the CI/CD pipeline before scaling rule count. The biggest mistake I see is authoring a hundred Sigma rules before having lint, regression test, and per-backend compilation in CI. The pipeline is the work that makes Sigma pay off; the rules are the easy part. Build against the ten pilot rules first, then scale. Pair with the DetectFlow pattern.
That sequence respects the load-bearing hedge: Sigma 2.0 is improving, the backend ecosystem is uneven and worth verifying per release, and the correlation extension is lagged by vendor-native engines on the hardest detections. None of these are reasons to skip Sigma; they are reasons to size the adoption work honestly before the rule count compounds into a maintenance burden.
Closing
Specification done, implementation uneven, the gap closing.
The Sigma 2.0 correlation specification (event_count, value_count, temporal, temporal_ordered) is a substantive expansion of what the IR can express. The pySigma backend ecosystem has more than two dozen targets, with the SigmaHQ plugin directory's state field giving an honest community view of which are production-credible and which are exploratory.
The honest summary for an architect making decisions this quarter: author atomic detections in Sigma and pilot the event_count and value_count idioms against your stable backends. Keep statistical baselining and graph correlation in vendor-native form for now. Build the CI/CD pipeline before you scale rule count. Verify backend feature claims (including vendor claims of "native Sigma support") against the current pySigma plugin directory and against compilation output on your own rules. Track the lakehouse-engine backend gap as a 2026 watch item.
Florian Roth and Thomas Patzke shipped a format in 2017 that has become the de facto open detection IR. The correlation extension is the most consequential expansion since the initial release. It is not finished and the implementation is uneven, but the direction is right and the adoption pattern that pays off is the hybrid one: portable IR for the atomic layer, vendor-native for the correlation tier where it matters most, both coexisting under a single detection-as-code pipeline.