Independent evaluation

The security market has no one defining what you can own.

Vendors could ship security tools you can run on your own iron, read the source of, and operate with the network unplugged. A few do. Nothing makes the rest, and more to the point, there's no independent force in this market that defines what "ownable" even means or scores vendors on it the way benchmarks score performance and audits score security. That gap was always there. The shift to agentic security is widening it, because an AI abstraction is the most opaque thing a vendor has ever asked you to trust, and the rush to ship it is running well ahead of anyone's ability to tell you which version you can actually own. This is the case for a referee, and for the one axis a capability matrix needs to grow.

Reading time: about 17 minutes. Evidence tier: B–C overall. The detection-content proof case (Sigma) is the strongest leg, graded A/B; the agentic and MCP landscape is fast-moving and vendor-sourced, flagged Tier C in line; and the practitioner-ownership claims about open self-hosted stacks are labelled where they're inference rather than demonstrated.

The referee that doesn't exist

We score security tools on everything except whether you can run them without the vendor.

A security buyer can find an independent read on almost any property that matters. Performance has benchmarks, public and re-runnable. Security has SOC 2, FedRAMP, and a pile of audits. Detection efficacy has the ATT&CK Evaluations. Even total cost has analyst models you can argue with. The one property nobody scores is the one that decides whether you're a customer or a hostage: can you run this tool, understand what it's doing, and keep operating it if the vendor relationship ends or the network to their cloud goes away. There's no rating for that, no matrix column, no independent body defining the terms, and so it never enters the purchase decision as a number. It shows up later, as a renewal you can't walk away from.

It helps to make the property concrete, because "ownership" drifts into a slogan otherwise. I score it on four questions, each of which has an honest answer for any given tool:

Air-gappable. Can it run with no outbound network, no call home to a vendor cloud?
Self-hostable. Can you run and inspect the binary or the server, rather than reaching it only as a hosted service?
Legible. Does it enumerate what it does in a form you can read, or does it abstract its own workings away from you?
Open formats. Are the durable artifacts — the detections, the data, the analysis — portable open text, or trapped in a proprietary container?

None of those is a yes-or-no for the whole product. A tool can store data in open Iceberg and still wrap its detection logic in a format that only runs in its own engine, which is most of the lakehouse market right now. The point of scoring all four, per component, is that "open" stops being a marketing word and becomes a measurement with a number attached, and the moment it's a number, a buyer can compare two vendors on it and a vendor has a reason to move. Today no one publishes that number, so the market has no pressure pushing it toward tools you can own, and quite a lot of pressure pushing the other way.

The same choice, all the way down

The own-or-rent fork repeats at every layer of the stack.

The reason this deserves its own axis rather than a footnote is that the choice doesn't happen once, it happens at every layer, and a team can win it at one layer and quietly lose it at the next. At the storage layer, a local object store with Parquet and Iceberg is yours, where a per-gigabyte-ingested SIEM rents you back your own data. At the schema layer, OCSF is an inspectable contract you can read and argue with, where a vendor's internal normalization is a black box you feed. At the detection layer, a Sigma rule is portable text you keep, where a vendor rule corpus leaves when the contract does. At the authoring layer, a notebook stored as plain Python is something you can diff and re-run, where a proprietary notebook bundle binds your analysis to one runtime, which I worked through in the piece on where detection notebooks should live. At the interface and the agent layers, which I'll get to, the same fork is sharper still.

What ties those together is a pattern I've written about before as lock-in moving up the stack. Open table formats genuinely won the storage war, so the control point didn't vanish, it migrated to the pipeline, and then to the authoring layer, and now to the agent. Each time the industry declares a layer open, the rent moves up one floor, and the pipeline lock-in story is the middle chapter of exactly this. The cost economics that make the owned option pencil out, the on-prem and repatriation case I made in the repatriation piece, are the same argument seen from the budget side rather than the control side.

The open option exists at every one of these layers. That's the part worth sitting with. This isn't a case where practitioners are asking for something that hasn't been built, it's a case where the thing has been built, repeatedly, and there's no force in the market helping a buyer find it, weigh it, or hold a vendor to it. Which raises the obvious question of whether the owned option, when it does get championed, actually holds up over time, and there I have a clean answer.

The proof case

Sigma is what happens when the open pattern gets a champion.

Detection content is the layer where I can show the ownership argument winning on the evidence rather than asserting it. Go back a few years and the shareable security use case was a Sysmon config you forked: SwiftOnSecurity's heavily annotated template, then Olaf Hartong's modular, ATT&CK-tagged successor. Both were genuinely good, widely used, and text in git, and both have stalled, the first with no commits since 2021 and the second since the summer of 2024. They were single-maintainer projects, central to the people who relied on them right up until the maintainer's attention moved, and then they froze. (Tier B, from the repositories' own commit history.)

What didn't stall is Sigma. The SigmaHQ ruleset carries on the order of three thousand rules across something like ten thousand stars, with a fresh release in April 2026, a core maintainer team rather than one person, and several hundred contributors. (Tier A/B, from the project's releases and repository metadata.) The design choice that made it durable is the part worth copying: the rule format and the per-engine conversion are split, so vendors maintain their own backends at the edges and no single vendor gates the format. Open governance plus a portable conversion layer plus an ATT&CK coordinate system is the combination that survived contributor turnover, and the single-maintainer configs that had the text-in-git property but not the governance did not. The honest asterisk, which I keep on this claim everywhere I make it, is that a shareable rule is not the same as identical detection across every engine; field-mapping fidelity is a real and unsolved soft spot, and I won't pretend the portability is total.

The lesson isn't "open always wins." It's that open content wins durably when something defines the standard, governs it, and pushes the ecosystem to support it, and stalls when it depends on one person's stamina. That conditional is the whole argument for an intermediary. The open option is real, but it needs a force behind it, and at most layers of the security stack that force is absent — which is exactly where the agentic shift is taking the problem from chronic to acute.

The agentic sharpening

An agent is the most opaque thing you've been asked to trust.

Security is shifting to agents that triage, hunt, and write detections, and the vocabulary around it hides a distinction worth keeping. There are three different things a vendor can mean by transparency, and they blur them on purpose. The first is whether the agent shows a readable verdict, the queries it ran, the evidence it gathered, and most products do this now and call it transparency. The second is whether you can see the agent's prompts, its model, and its decision logic, which almost none of them offer. The third is whether you can run the whole thing on infrastructure you control. When a vendor says "fully transparent," they mean the first, and the worry a practitioner actually has lives in the second and the third. (Tier B, from reading the products' own documentation against each other.)

Measured that way, the platform copilots are closed on the axes that matter. Microsoft's Security Copilot is documented as a SaaS application with no on-premises option; Google's SecOps agents, CrowdStrike's Charlotte AI, and SentinelOne's Purple AI are all hosted, proprietary-model services whose reasoning you can't inspect and whose brain you can't relocate. (Tier C, vendor documentation and marketing; I'm citing the direction of the market, not claiming to have benchmarked any of it.) I want to be careful here, because this is the lane where it's easy to drift into commentary about AI in general, which isn't my subject. My subject is narrower: the agentic layer is being delivered, by default, as the least ownable thing in the security stack, and the default is not the only option.

The ownable path is real but it's assembly-required. An agent built on the code-action pattern, where the model emits Python you can read, version, and re-run rather than hidden tool calls, produces its own audit trail, and that pattern has peer-reviewed backing showing it outperforms the hidden-tool-call approach (Wang et al., CodeAct, ICML 2024, Tier A). Put that on an open framework with an open-weight model running on your own hardware and you have an agent you can inspect and air-gap. The honest cost is that this configuration trails the frontier SaaS models by something like six to twelve months on the hardest reasoning and shifts the operational burden onto you, and there's no off-the-shelf product for it yet (Tier D for "good enough for the hard cases," which nobody has demonstrated air-gapped at scale). So the answer to "is it all black boxes" is no, but the owned alternative needs someone to define it, score it, and push for it, or it stays what it is today, a thing a few specialists assemble for themselves while everyone else rents the hosted version.

Since I first wrote that, the off-the-shelf gap has begun to close from the open-source side, and it closes on the ownership axis rather than the efficacy one, which is the distinction this whole piece turns on. A class of AI-SOC platforms now ships under licenses you can hold: Vigil from DeepTempo under Apache 2.0, Tracecat under AGPL-3.0, FunnyWolf's agentic-SOC platform under MIT, and Wazuh's agentic integration in early preview on its GPLv2 core, several of them running a local open-weight model and all of them speaking MCP rather than a private protocol (Tier C, vendor and repository sourced). What none of them has yet is a measured efficacy number I'd put weight on; the noise-reduction and autonomy figures attached to them are vendor self-claims, and one advertises a thirteen-agent roster its own README lists as twelve. That distinction is exactly what the ownership axis is built to keep visible. These tools clear the air-gappable, self-hostable, legible, open-format bar that the platform copilots fail, and whether they hunt well is a separate question the same referee would still have to measure. The owned agentic path is no longer purely assembly-required; it is becoming a shelf you can inspect, with the efficacy column still blank.

The tell

Vendor MCP servers show you exactly where the legibility went.

The clearest current evidence is the rush to publish MCP servers. The Model Context Protocol is the standard, now governed under the Linux Foundation's Agentic AI Foundation, by which an agent reaches a tool, and a wave of security vendors shipped servers in the last year. The ones worth caring about are for the products that produce the telemetry a SIEM ingests, the data sources: endpoint, network, cloud, identity, vuln, threat intel. The destination platforms that bolt an MCP onto the SIEM are a different question, and the pipelines a different one again. For a data source, the test of a good MCP server is sharp: it should run air-gapped, and it should teach you the two things the vendor usually keeps in its own people's heads and docs, how to administer and manage the system, and how to access and analyze the data it produces. That is the opposite of Microsoft Sentinel's pitch that you "don't need to understand the schema." A good MCP doesn't hide the schema and the operations, it hands them to you in a form an agent, or a person, can run offline.

The split is the finding, and it falls along ownership. Across thirty-one data-source producers I inventoried, the seventeen that are open and self-hostable are all air-gap-capable, and they span six categories, not one: network (Zeek, Suricata, Arkime), endpoint (Wazuh, osquery, Velociraptor), cloud-runtime (Falco), vuln (OpenVAS, Trivy), identity (Keycloak, FreeIPA), and threat intel (MISP, OpenCTI). Several already encode both axes the ask calls for: Wazuh and Velociraptor carry administration and analysis both, Keycloak the admin side, Zeek and Suricata the analyst side. Of the eleven commercial SaaS producers, not one is air-gap-capable. CrowdStrike ships the best-built server in the whole set, MIT-licensed and genuinely legible, and it still phones home because Falcon is a cloud product. And of the network-detection leaders, only Corelight shipped an MCP at all; Darktrace, ExtraHop, and Vectra have none. (Tier B for the inventory, per-vendor marketing flagged Tier C.) So the air-gap-capable layer is the open, on-prem sensor and agent ecosystem, and the brand-name cyber producers are absent from it by construction.

Here is that split as a starting scorecard for the data sources, showing whether you can run each one with the network unplugged, and which of the two best-practice axes its MCP encodes. The pattern is plain: the air-gap-capable producers are the open, on-prem tools, and the SaaS brands, however good their servers, sit on the wrong side of the line because their products were never air-gappable.

Data source (category)	Self-host	Air-gap	Encodes
Zeek / Suricata (NDR)	Yes	Yes	analyst
Arkime (NDR / full PCAP)	Yes	Yes	analyst
Wazuh (host / XDR)	Yes	Yes	admin + analyst
Velociraptor (DFIR)	Yes	Yes	admin + analyst
Falco (cloud-runtime)	Yes	Yes	analyst
OpenVAS / Trivy (vuln)	Yes	Yes	admin + analyst
Keycloak / FreeIPA (identity)	Yes	Yes	admin
MISP / OpenCTI (threat intel)	Yes	Yes	analyst
CrowdStrike Falcon (EDR; MIT server, SaaS product)	Yes	No	admin + analyst
SentinelOne · Wiz · Tenable · Okta · Cloudflare (SaaS)	part	No	varies
Darktrace · ExtraHop · Vectra (NDR leaders)	—	No	no MCP

Thirty-one producers inventoried, grouped here; "air-gap" requires the product on-prem, not just the server binary, which is the bar the SaaS brands fail. "Encodes" is which best-practice axis the MCP carries, administration or analysis. A research snapshot rather than a census, and most of the open entries have several competing community servers.

The heart of it is that vendors could ship air-gappable MCP servers that teach how to run their system and how to read its data, for the practitioners who operate inside a closed network. The protocol supports it, the stdio transport is local by definition, and the open tools already do it. There is just no market force identifying that capability, defining what it requires, and pushing for it, so each vendor optimizes for the hosted service that fits its business model and the air-gapped practitioner is left to assemble it alone. The fallback, when the vendor won't, is the lakehouse move: ingest the source into your own store and run a self-hosted MCP, a DuckDB server over the logs, against your copy, air-gapped by construction. The honest limit is that a generic server over your lake gives you the records and the triage but not the tool-native actions, you can read what Velociraptor collected without its MCP but you can't launch a hunt, so the dedicated air-gappable vendor server, scored on those two axes, is still the thing worth asking for.

The job

The missing piece is a referee, not another vendor.

What the market is missing is an independent party whose job is to define what ownable means, test tools against it head to head, publish the scores so a buyer can compare and a vendor has something to answer to, and keep pushing the standard forward as the layers shift. That role has a long precedent outside security — the lab that tested intelligence-analytics tools and the capability matrix that mapped them, so a customer could compose a stack for their own purpose instead of trusting a single vendor's pitch. The structure is what makes the neutrality verifiable rather than asserted: a public lab with re-runnable benchmarks, a versioned matrix with the criteria in the open, and conflicts disclosed up front. Apply that to security data and you get an evaluator who can say, with a number behind it, which tools at each layer actually let you own them.

The concrete instrument is a single column added to that matrix: a practitioner-ownability score, the four questions from the start of this piece — air-gappable, self-hostable, legible, open formats — applied to every candidate at every component, and applied hardest at the new agentic and interface layer where the abstraction runs deepest. The MCP inventory above is the first data behind it; the Sigma case is the evidence that the open pattern, once scored and championed, outlasts the locked one. This is the move that turns "we prefer open architecture" from a values statement into an audit, and an audit is the thing a vendor can't wave away, because the benchmark is reproducible and the divergence between a vendor's claim and an independent re-run is visible to anyone who looks.

This has to be a referee's job because a vendor selling an "ownable agentic platform" has the same incentive problem as everyone else, so the scoring has to come from someone who sells the evaluation rather than the tool and discloses any relationship that could bias it. The value of that independence rises as the products get more opaque, because the harder a box is to see into, the more a buyer needs an evaluator with no stake in the box staying shut. The agentic era is the strongest argument for the fair-broker role that the security-data market has produced, and the gap is worth naming now, while the layer is still forming and the defaults aren't yet set.

What you do about it now

Make ownership a number in your own evaluations.

You don't have to wait for the market to grow a referee to start scoring like one. In your next platform or tooling decision, put the four ownership questions on the page next to the capability and cost columns, and make each vendor answer them on the record: can it run air-gapped, can you self-host and inspect it, does it enumerate its own workings, and do the detections and data come out as portable open text. Keep the durable artifacts in formats you hold — Sigma rules and SQL and plain notebooks rather than vendor containers — so that whatever you decide at the tool layer, the work you build on top stays yours. When a vendor calls a product "open," ask open at which layer, because the answer is usually the storage layer and far less often the authoring or agent layer where the abstraction runs deepest, and that gap between the two is what an ownership score would force into the open.

And price the ownership honestly rather than romanticizing it. An air-gapped open stack on local models trails the frontier and puts the update discipline on your team, and that cost is real, so the right call isn't "own everything" any more than it's "rent everything." It's to know the number, for this tool, at this layer, what owning it costs and what renting it costs you in control, and to make the trade deliberately. A referee doesn't hand you a verdict that open is always right; it hands you that number, early enough that you decide the trade rather than discovering it at renewal.