Technology deep-dive

Sigma and detection portability: the fourth foundational standard.

Sigma is one of four foundational open standards I name when I describe the security lakehouse, alongside Apache Arrow for in-memory representation, Apache Iceberg for table format, and OCSF for event schema. The other three each have multiple depth pieces on this site, while Sigma had none, so this essay closes that gap and tries to do so honestly. Sigma's atomic-detection portability story is real and useful, the correlation-maturity story is not yet at parity with vendor-native engines, and that difference matters for how an architect should pilot it this quarter.

Reading time: about 19 minutes. Evidence tier: B overall (project documentation, practitioner accounts, vendor product pages I can verify), with Tier A on Sigma's own specification and Tier D on specific 2026 roadmap claims I have not independently confirmed. Flagged inline.

What Sigma is

An open YAML detection format with an intermediate representation.

Sigma is an open, engine-agnostic detection rule format. Detections are written in YAML (a structured description of log sources, field matches, conditions, and metadata) and then converted by a backend compiler into the query language of a target detection engine. Florian Roth and Thomas Patzke originated the format in 2017 out of the German SOC community, and it has been actively developed since. The Sigma project at SigmaHQ on GitHub maintains both the specification and a public rule repository with thousands of community-contributed detections.

The thing that matters architecturally is the intermediate representation. A Sigma rule is not directly executable. It compiles, via the pySigma toolchain, into Splunk SPL, Microsoft Sentinel KQL, Elastic ES|QL or EQL, Google Chronicle YARA-L, lakehouse SQL, and around thirty other backends depending on which pySigma plugins you install. The single source of truth (the YAML rule) converts to whatever dialect the target engine speaks.

That is the structural argument. Detection content written once should not need to be re-authored every time a SOC swaps SIEMs or adds a second analytical engine. The Sigma specification itself is the only Tier A piece of evidence I cite without qualifier. The spec exists, is published, and is the authoritative description of the format. Everything else in this essay is Tier B or below.

The portability argument

Write once, convert and validate per backend.

The marketing tagline for Sigma is "write once, deploy everywhere," which is sloppy phrasing I won't defend, because the engineering reality is closer to "write once, convert per backend, validate per environment, and accept that some constructs will not round-trip cleanly." That's still a useful property, though it's a weaker one than cross-platform binary compatibility, and an architect evaluating Sigma for a deployment should expect the weaker version.

The cost Sigma is trying to eliminate is the rewrite tax, because today most SOCs I look at maintain the same detection logic in three places. A username-enumeration rule lives once in SPL for Splunk Enterprise Security, once in KQL for Sentinel as the SOAR/cloud target, and once in YARA-L or Chronicle SQL for the long-tail archive, which means three rule files and three change-management workflows, plus three sets of unit tests if you're disciplined and zero if you're not. When a field renames or a log source format shifts, the synchronization cost is per engine, per rule, per change, and that cost is what compounds into the "detection backlog" most SOC managers describe, since the hard part isn't authoring the detections but keeping the same detection consistent across engines.

Sigma's promise is that the YAML rule becomes the source of truth and the engine-specific dialects become compiled artifacts. The synchronization cost drops to one rule per detection, and the conversion to each backend is automated through pySigma. The CI/CD pipeline becomes: lint the YAML, compile to each backend, run regression tests against historical telemetry per backend, deploy.

The honest hedge is that conversion is lossy in specific places. Vendor-native SPL or KQL is often more expressive than Sigma's intermediate representation, because there are SPL macros, KQL operators, and chained transformations that don't have a Sigma equivalent, so a detection author who needs the full expressive power of SPL is going to write SPL. Sigma covers the common cases well and doesn't reach the long tail, which is how I'd read as a property of the abstraction rather than a defect, since every intermediate representation I've worked with has had the same property.

What Sigma is good at

Atomic detections, well-known patterns, ATT&CK coverage.

The class of detection Sigma handles cleanly is the atomic detection, a single-event match against a known indicator pattern. Suspicious command-line invocations, known-bad process trees, specific registry modifications, anomalous Windows event IDs, exfiltration to specific TLDs, lateral-movement fingerprints that appear in one event record. If the detection answer is "this single event matches the following field conditions," Sigma is well-shaped for it.

The SigmaHQ public rule repository is the largest practical demonstration of this. It contains thousands of community-maintained rules organized by log source (Windows Sysmon, Linux auditd, AWS CloudTrail, Azure activity logs, Office 365, network IDS output) and tagged against MITRE ATT&CK techniques. For an organization that wants a baseline of community detections quickly, downloading the SigmaHQ repo and compiling it against the local engine is a sensible starting point, though it's a starting point rather than a finished detection program, giving you thousands of pre-written examples of the atomic-detection idiom to curate down from.

The MITRE ATT&CK coverage angle deserves its own note, because Sigma rules carry ATT&CK technique tags as metadata, which means a SOC can compile coverage maps mechanically and see which techniques have detections, which don't, and which have multiple detections from independent sources. That mechanical-mapping property is an underappreciated feature, since it supports detection-maturity work that previously required manual spreadsheet maintenance.

A point on how cross-SIEM coverage actually works in practice, because the mental model many people reach for is wrong, since the thing that delivers broad coverage today is aggregation rather than auto-translation. Michael Haag's Security Detections MCP aggregates 8,200-plus community detections and indexes them by MITRE ATT&CK technique, and those detections are written in each platform's native language and curated by the communities that authored them, rather than one canonical rule machine-translated across engines. That distinction matters for what you should expect from a Sigma pilot, because Sigma's compiler gives you per-backend conversion of the rules you author and not a magically translated universal corpus. Where coverage is broad, it's broad because a community wrote and curated the per-engine variants, and that curation is the work the tooling doesn't do for you.

For the SOC capability ladder I describe elsewhere on this site (see detection maturity for the long version), Sigma is the enabler that moves a team from ad-hoc rule authoring (level 1) to repeatable, portable, version-controlled detection content (level 2 and into level 3), though it doesn't by itself get you to the higher levels, and the correlation-maturity gap I describe next is why.

The correlation-maturity caveat

Where Sigma is not yet at parity with vendor-native engines.

I want to be direct about this because it's the central honest critique of the Sigma story, and I see it underplayed in vendor talks and overplayed in detractor pieces. Sigma's atomic-detection coverage is excellent, while its multi-event correlation coverage is improving but still lags vendor-native correlation engines materially as of early 2026.

The kinds of detection that benefit from a correlation engine include time-windowed sequence detection ("four failed logins followed by one success from the same source within sixty seconds"), statistical baselining ("this account's login rate is three standard deviations above its trailing thirty-day baseline"), entity-resolution joins across multiple log sources ("this DNS query, this firewall flow, and this EDR process-start are the same incident"), and graph-shaped correlations across event chains. Splunk Enterprise Security correlation searches, Microsoft Sentinel analytics rules, and Chronicle's multi-event rule constructs all support these constructs as first-class features with mature production tooling around them.

Sigma's response to this is a correlation extension, sometimes referred to as part of the Sigma 2.0 evolution, that adds time-window aggregation, count-based thresholds, and inter-event joins to the specification. The work is ongoing and improving. I will not claim specific feature availability or specific milestone dates here, because I have not independently verified the current state of the pySigma release I would be referring to. Treat any Sigma 2.0 correlation-feature claim (including claims you read in vendor materials) as Tier B at best, and verify against the current pySigma release notes before committing architecture decisions to a specific capability. The shape of the gap I'm comfortable describing as Tier B evidence is that correlation in Sigma is the active development frontier rather than a solved problem.

The write-weak side of Sigma extends past correlation, and the most useful confirmation comes from a party with no reason to flatter the format. Tenzir, building a security data pipeline product, concedes publicly that Sigma's value-typing is thin (IP addresses, CIDR ranges, and timestamps are treated string-first rather than as typed values) and that its taxonomy and field-mapping layer is weak. I read that as a credible Tier B confirmation precisely because Tenzir is a vendor critiquing a standard it otherwise supports. Their own answer is telling, because rather than replacing Sigma they pair it with OCSF and a set of pipeline mapping operators that supply the typing and field normalization Sigma doesn't, so the atomic rule grammar stays while the value-typing and taxonomy get handled in the layer beneath it, which is the same write-side anchoring the correlation gap shows, surfacing here in a second place.

What that means for an architect is that Sigma is the right home for the atomic-detection portion of a detection portfolio while it isn't yet a complete replacement for vendor-native correlation features in the analytics tier above atomic detections. A reasonable 2026 architecture authors atomic detections in Sigma and leaves the highest-value multi-event correlations in vendor-native form until the Sigma correlation extension matures further. I would rather an architect plan around that gap honestly than adopt Sigma on a "write once, correlate everywhere" promise that doesn't hold yet.

The four-pillar argument

Where Sigma fits alongside Arrow, Iceberg, and OCSF.

The four-pillar framing (Arrow, Iceberg, OCSF, Sigma) is meant as a layered architectural argument about which open standards remove which kinds of vendor lock-in rather than as a slogan, because each pillar addresses a different layer and the value compounds when all four are present.

Apache Arrow and Apache Iceberg live at the data layer. Arrow specifies how columnar data should be laid out in memory so that engines can share buffers without re-encoding; Iceberg specifies how columnar data should be organized on disk and tracked as tables. Together they remove the storage and interchange lock-in vectors. The depth piece on Iceberg vs Delta sits at iceberg-vs-delta; the Arrow plus ADBC piece sits at arrow-adbc.

OCSF (the Open Cybersecurity Schema Framework) lives at the schema layer. It defines portable semantics for security events: what a network connection record means, what a process-start record means, what fields each event class carries. OCSF removes the schema lock-in vector. Without OCSF, every SIEM defines its own field shapes and you write extraction logic per source per destination. The LLM-assisted OCSF mapping work I've written about sits at llm-ocsf-mapping.

One clarification, because the two standards get conflated in conversation. OCSF does not compete with Sigma as a detection language, and Sigma's authoring role is not under threat from it. OCSF's Detection Finding class (UID 2004) is an output schema. It describes the shape of a finding a detection emits, not the grammar you author the detection in. The pairing is complementary: Sigma is where you write the rule, OCSF is the normalized field set the rule reads from and the normalized finding it writes to. The Sigma project leans into exactly that pairing. SigmaHQ maintains pySigma-pipeline-ocsf, an official MIT-licensed processing pipeline that maps Sigma's field references onto OCSF-normalized fields so a single rule compiles against an OCSF-shaped lakehouse, which is the stack working as intended rather than the two standards contending.

Sigma lives at the analysis layer. It removes the detection-logic lock-in vector. The first three pillars get your data into a portable, queryable shape; Sigma is the standard that lets the analytic content sitting on top of that data move too. Without Sigma, the detection content stays trapped in whichever engine you wrote it for, and a SIEM swap means rewriting hundreds or thousands of rules.

There's a structural symmetry worth naming, because it runs through every pillar in this stack. Sigma's portability is read-strong and write-weak. The atomic rule grammar ports cleanly to thirty backends; the things on the write side (value-typing, taxonomy and field mapping, the correlation semantics described above) don't port and stay anchored to whichever engine actually executes them. That's the same asymmetry I described in the lakehouse catalog work, where an open schema reads everywhere but the security-grade write controls (ABAC, provenance, audit-retention) don't travel with the data and stay bound to a specific catalog and engine, so read portability tends to be the easy half of an open standard in security while write portability is where the lock-in retreats to and survives. Sigma sits on the same fault line as Iceberg and OCSF, which is a reason to expect the correlation gap to close slowly rather than a reason to doubt the pillar.

Together the four pillars remove three of the four lock-in vectors I track: storage, schema, and detection logic. The fourth (correlation maturity) is the open problem. Vendor-native correlation engines are still ahead of the open standards on multi-event sequence detection and statistical baselining. That gap is the one I expect to be debated through 2026 and 2027, and it's the reason I recommend treating Sigma as portable IR for atomic detections rather than as your authoritative detection content for the highest-value correlation rules.

Practical adoption patterns

How teams are actually using Sigma in production.

Three patterns dominate in the deployments I've seen or talked to practitioners about. Each addresses a different point on the build-versus-buy spectrum.

Vendor platforms built on Sigma-style portability

Anvilogic and Panther are the two vendors I name without hedging because their public product pages document Sigma-style detection portability as a first-class feature. Anvilogic's detection-as-code platform exposes a Sigma-compatible authoring surface and compiles to multiple SIEM backends. Panther's detection-as-code product similarly treats portability as a core property, with Python-based rules that run against a lakehouse. Neither is a pure pySigma deployment (both extend the core idea), but both validate that there is a commercial market for detection portability and that customers will pay for the abstraction.

I list these two as Tier B (vendor product documentation, verifiable on their public sites at the time of writing). I do not claim either vendor uses pySigma under the hood, since that's an implementation detail I haven't verified, and the claim I'm actually making is that the portability category exists and is being commercialized rather than that any specific vendor is a pure Sigma implementation.

Hand-rolled detection-as-code with pySigma plus CI/CD

The second pattern is teams that build their own detection-as-code repo with pySigma at the core, GitHub Actions or GitLab CI as the build system, and per-backend deployment jobs that push compiled artifacts into Splunk, Sentinel, Chronicle, or a lakehouse engine. The detection content lives in YAML in version control; the CI pipeline lints, compiles, regression-tests against historical telemetry, and deploys. This is the pattern most aligned with the DetectFlow methodology I describe at /reference-architectures/methodologies/detectflow.

The honest cost is that this pattern works for teams that already operate engineering discipline, because if your SOC does not have a Git workflow, code review, and a CI mindset, then adopting Sigma plus pySigma plus CI/CD is a cultural transformation more than a tooling decision, and the tooling is the easy part. Teams that adopt the tooling without the discipline end up with a YAML repo that's drifted out of sync with what's deployed in their SIEM, which is the worst of both worlds.

Sigma as a starting baseline plus engine-native authoring on top

The third pattern is a hybrid: SigmaHQ rules compiled and loaded as a baseline, vendor-native rules layered on top for the high-value correlation work that Sigma doesn't cover well yet. This is the pragmatic 2026 approach for most teams I talk to. It accepts Sigma's current strengths (atomic detections, ATT&CK coverage) and its current limits (correlation), and it doesn't pretend the team has to choose between Sigma and vendor-native. Both coexist; the architecture is honest about which is the right tool for which detection.

The honest critique

Where the Sigma story gets oversold.

I write about Sigma as a foundational pillar, which means I have a positive thesis on it. That doesn't excuse me from naming what I don't like, and there are four critiques I'd want any architect to internalize before adopting.

Community rule quality varies wildly. The SigmaHQ repo has thousands of rules of widely different quality. Some are excellent. Some are stale. Some encode assumptions about a specific Sysmon configuration that doesn't match your environment. Loading the entire repo without curation is a false-positive disaster waiting to happen. The starting-baseline pattern requires curation as a discipline, not as a one-time setup task.
Backend conversions need per-environment validation. A Sigma rule that compiles cleanly to SPL may produce a query that performs poorly on your specific Splunk indexer configuration, or matches differently because your field extractions differ from the rule author's. The conversion is not a contract; it's a starting point that needs validation per backend, per environment. Treat regression testing against historical telemetry as mandatory, not optional.
Vendor incentives push against true portability. Splunk and Microsoft do not benefit commercially from detection content being trivially portable away from their platforms. Lock-in is a revenue model. Even when vendors offer Sigma import support, the path of least resistance (pricing, tooling, native authoring surfaces, partner integrations) favors staying on the vendor-native authoring path. I would not expect any SIEM vendor to make the Sigma path easier than their proprietary path. That's a structural reality, not a complaint about specific vendors.
Sigma is community-led, which is both a feature and a risk. The format and the rule repo depend on volunteer maintainership. The project has been active and well-maintained for years, but it is not backed by a single foundation with a large engineering team in the way Arrow and Iceberg are. That's not a reason to avoid Sigma. It is a reason to track project health as part of your foundational-standards risk register.

None of these are reasons to dismiss Sigma, but they are reasons to size expectations and to plan the adoption work around them rather than discovering them in production.

2026 outlook

What I'm watching this year.

Two threads will shape whether Sigma's portability promise tightens or stalls through 2026. I'm deliberately hedging both because I do not have authoritative information on the specific milestone dates and I would rather under-claim than retract.

Sigma 2.0 correlation extensions

The correlation extension is the most consequential ongoing work in the Sigma project. If it lands in a usable shape (meaning time-window aggregation, count thresholds, and inter-event joins that compile cleanly to multiple backends), Sigma's coverage of the analytic detection layer expands materially. If it stalls or ships with backend-specific gaps that fragment the portability story, the atomic-detection-only positioning I describe in this essay stays accurate for longer.

I do not have current authoritative data on Sigma 2.0 milestone status. I treat any specific feature availability claim as Tier B or Tier D until verified against the current pySigma release. The right move for an architect tracking this: subscribe to the SigmaHQ GitHub repo notifications, read the pySigma release notes when they come out, and validate any vendor claim of "Sigma 2.0 correlation support" against the actual specification before committing.

Vendor adoption signals

Vendor adoption of native Sigma import or authoring support is the second thread. The honest version here is that I have not independently verified the current state of native Sigma support in Splunk Enterprise Security, Microsoft Sentinel, or CrowdStrike Falcon's detection authoring surface. I have seen marketing claims, partner integrations, and community-maintained converters, but I would not stake an architecture recommendation on a vendor offering "native Sigma support" without confirming what "native" means in practice. Is it import-only? Bidirectional? Does it cover the correlation extension? Does it preserve metadata? Verify per vendor, per release, before relying on it.

Where I'd invest detection budget today, given that uncertainty: the pySigma toolchain plus a Git-backed detection-as-code repo, with vendor-native authoring reserved for correlation rules that don't compile cleanly. That's a portfolio that survives any plausible outcome of the vendor adoption signals. If native support improves, it's a bonus; if it stalls, the pySigma compilation path keeps working regardless.

What to do this quarter

Pragmatic guidance for an architect piloting Sigma.

Four moves I'd recommend to an architect evaluating Sigma in the next ninety days, ordered by how cheap they are to execute and how much they tell you about whether Sigma fits your environment.

Pilot pySigma on a slice of your detection backlog. Pick ten to twenty atomic detections that you already maintain in two engines (SPL and KQL, for example). Author them in Sigma. Compile to both targets. Compare the compiled output against your hand-maintained versions for false-positive and false-negative parity. This tells you concretely whether the conversion quality is acceptable for your environment without committing to a wholesale migration. The lab's own conversion-parity harness, which runs this comparison across backends, is at sigma-portability in the lab.
Measure conversion quality on YOUR rules, not the demo rules. The SigmaHQ demo rules and the pySigma documentation examples are curated to compile cleanly. Your detection backlog isn't. The signal that matters is how well your existing rules round-trip through Sigma, not how well the demo rules do. Track per-rule conversion success rate and per-rule false-positive delta against the hand-maintained baseline.
Treat Sigma as portable IR, not as your authoritative detection content (yet). For the next year at least, I'd keep the vendor-native rule files as the deployed artifact and treat the Sigma YAML as the canonical source. That hybrid lets you adopt Sigma's portability discipline without betting your production detection program on a toolchain that's still maturing on the correlation side. As the correlation extension matures and vendor native support firms up, the ratio shifts.
Pair Sigma adoption with the DetectFlow discipline. Sigma is the format. DetectFlow is the operational pattern that makes the format pay off: version control, CI/CD, regression testing, per-deployment performance and false-positive measurement. Adopting Sigma without DetectFlow is a YAML repo that drifts. Adopting both together is the path that scales.

The framing I'd offer any architect reading this piece is that Sigma isn't a finished story, but it is the fourth foundational pillar in a security data stack that's finally becoming portable end-to-end, and it's the right home for the atomic-detection layer of a SOC's detection portfolio in 2026 even though it isn't a complete replacement for vendor-native correlation engines yet, so plan around both facts honestly and the adoption pays off.

Closing

The fourth pillar, with the gap named.

I named Sigma as the fourth foundational standard in the same essay where I named Arrow, Iceberg, and OCSF, and then this site went a year without a depth piece on it, which was a structural omission rather than a considered choice, and this essay closes it.

The honest summary: Sigma's atomic-detection portability is real, useful, and the foundation of a detection program that can survive a SIEM swap without rewriting hundreds of rules. The correlation extension is the active development frontier and is not yet at parity with vendor-native correlation engines. The community is healthy, the rule repo is a starting point rather than a finished detection program, and vendor adoption signals should be tracked but not trusted at face value. The four-pillar argument (Arrow, Iceberg, OCSF, Sigma) removes three of the four lock-in vectors from the security data stack. The fourth, correlation maturity, is the open problem that 2026 and 2027 will resolve one way or the other.

For an architect making real decisions this quarter, the recommendation is straightforward: pilot pySigma on a slice of your backlog, measure conversion quality on your own rules, treat Sigma as portable IR for the atomic-detection layer, and pair the adoption with DetectFlow discipline. If the Sigma 2.0 correlation work matures faster than I'm sizing it, you'll be ahead of the curve. If it doesn't, you'll still have removed a real lock-in vector from your detection program. Both outcomes are good ones.