H-ARCH-12 · Tier C · 3/5

Security-data infrastructure for agent-scale workloads.

Nearly every security vendor is suddenly "AI-powered" and nearly every data platform is "AI-ready," and the attention is on the models and on the governance frameworks that wrap them, but I think the binding constraint sits a layer below both, in the security-data infrastructure that the agents actually run on. The pipe, the lakehouse, the schema, and the query layer are what decide whether agent-scale access works at all, so this page tracks the read I keep coming back to from the data side: which vendors rebuild that layer for AI-first access rather than bolting AI onto a BI-era stack, what the agent-scale workload costs in infrastructure terms, and why the order of operations runs pipe-before-policy.

The lane this argument runs in

The constraint is the data layer, not the model layer.

The broad-AI shift toward agents is well covered by people whose job is broad AI, and I am not one of them, so I am not going to claim to have noticed that agents are coming or that they change how software gets built. What I track is narrower and, I think, less crowded: what an agentic workflow does to the security-data infrastructure underneath it. A copilot or an autonomous agent does not read a dashboard once an hour the way an analyst does; it issues queries in bursts, follows references, expands its own context, and comes back for more, so the load it puts on the pipeline and the query engine has a different shape from anything a human-facing SIEM was sized for. The model can be excellent and the governance framework can be airtight, and the workflow still stalls if the data layer underneath cannot serve the access pattern, which is why I keep the analysis anchored there.

That framing is a specialist read of a general trend rather than a claim about the trend itself. The general trend, that analytics is moving from human-driven BI toward agent-driven AI, was stated plainly by people with structural credibility in the data world before I wrote any of this down. Jay Kreps, who created Apache Kafka and runs Confluent, put it as "the analytics world is moving from BI to AI" when Confluent and Databricks announced a streaming-to-lakehouse partnership aimed at AI workloads in late 2025, and his phrasing for it, that this is about powering actions rather than only enabling insights and that analytical data therefore has to "work at the speed of operational applications," is a category-level description of the same load shift I care about. I cite that because it is the well-covered framing I am situating within, and the originality I am adding is downstream of it, on the security-data infrastructure that has to absorb the shift.

So the rest of this page is three connected claims about the data layer. The vendors who matter rebuild security-data infrastructure for AI-first, agent-scale access rather than bolting an AI feature onto a BI surface. Agent-scale security-data carries a specific, describable infrastructure tax, which is a cost question before it is a model question. And the order of operations is pipe-before-policy, because the governance and policy layer cannot deliver on its claims until the data layer underneath it is right.

Bolt-on versus rebuild

Adding a chatbot versus rebuilding what the surface can carry.

The distinction that organizes the vendor landscape is between adding AI features to existing infrastructure and rebuilding the infrastructure for AI-first access. The bolt-on pattern looks like a dashboard with an LLM chat box stapled to it: ask questions in natural language, get AI-suggested detection rules, run automated anomaly detection. The data underneath still moves through batch ETL on a schedule, the dashboards still refresh on a fifteen-minute cadence, and the analyst still stares at the same Kibana or Splunk surface with a chatbot layered on top. Most SIEM product announcements over the last eighteen months have been variations on this, and it is not worthless, because a natural-language query against BI data is a useful affordance, but it does not change what the infrastructure can carry, which is the thing an agent-scale workload actually tests.

The rebuild pattern starts from the load shape instead of the interface. If agents will issue thousands of queries a minute rather than a few hundred human analysts issuing a few queries a day, the query engine has to be built for machine access patterns, with sub-second responses that hold under high concurrency rather than thirty-second dashboard loads under light load, and a cost structure that survives a hundredfold increase in query volume. If a model can generate parsers and schema mappings from a log sample, the integration architecture has to look different too, with pipeline-as-code generation rather than click-through UI config, MCP-style integration rather than human-oriented REST, and the customer owning the integration surface rather than waiting on a vendor's roadmap. And if the workflow needs to act in real time rather than surface an insight for a human to read later, the bridge from streaming into the analytical lakehouse has to be built for operational speed rather than batch discovery latency. These are architectural prerequisites for an agentic workload, not features, and the vendors making the rebuild commitments are betting that the prerequisite gap starts mattering inside the 2026–2027 procurement cycle.

A few vendors have made the rebuild commitment explicit enough to read on architecture grounds. Cribl has framed the problem as legacy telemetry infrastructure built for humans reading logs collapsing under agents issuing thousands of queries a minute, and its answer is a unified data layer pulling human, machine, and AI-generated context together, with a headline of "10× the queries at half the cost." Tenzir has built an MCP server that AI-generates parsers, OCSF mappings, and test suites from a single log sample, with a "100% hands-off keyboard" positioning and a claimed 100+ Gbps ingest over zero-copy Apache Arrow, which is the other half of the same rebuild: where Cribl attacks the AI-consuming-data problem with query optimization for agent workloads, Tenzir attacks the AI-generating-integrations problem with pipeline-code automation. I want to be careful with those numbers, because none of them is independently reproduced; the 10× and the 100+ Gbps and the hands-off accuracy are vendor figures with no published production benchmark behind them, so I read them as statements of architectural intent rather than as measured results. What promotes this from security-vendor positioning to an industry pattern is that Databricks made a parallel move, announcing an MCP Catalog in November 2025 that exposes Unity Catalog-governed data to MCP-protocol agents, so the lakehouse leader is rebuilding its catalog layer for agent access on the same diagnosis the security-data vendors are working from.

One caution before treating "AI-native" as a synonym for "more correct," because the rebuild label covers two ways of building the machine-query layer that fail differently. One lets a model compose the query directly, text-to-SQL or GraphRAG style, which carries a silent-error tax on the adversary tail where a valid-but-wrong query executes cleanly and passes shallow validation; the other is a formal OBDA-style rewrite that is provably correct on what it covers but bounded by what the underlying ontology can express. I work through that trade, and the BIRD execution-accuracy numbers that make the gap concrete, in the LLM-OCSF-mapping piece, so I will not re-derive it here, except to say that the discriminating procurement question is which of the two a vendor actually built, because "AI-native" describes both even though only one of them carries the silent-error tax.

A first-party measurement in our lab (BENCH-C, June 2026, Tier B, claude-opus-4-8 as a frontier proxy) puts a number on where the value actually sits, and it pushed back on what I expected. I ran a pre-registered control that fed the model the same retrieved facts twice, once as a flat list and once with the concept-graph structure left in, so the only thing that changed was whether the relationships were present, and across nine adversary-tail queries the structure earned exactly one of them. It paid on the identity-closure query, where an analyst has to collapse an alias chain to count distinct assets and the flat list could not, while on the other eight the bottleneck was retrieval rather than structure. Read alongside the companion mapping result, where formal grounding did not beat wrong grounding even at the frontier and the schema constraint was the thing doing the work, the honest version of the AI-native story is that what makes the rebuilt layer safer is the deterministic harness around the model, schema validity and structured traversal on the queries that need it, rather than anything the model knows on its own. That is a single directional pilot on one synthetic corpus and I would not over-read the magnitude, but it sharpens the skepticism rather than softening it, because it locates the correctness exactly where a buyer can specify and test it. To make that discipline something a buyer can walk rather than take on faith, I put the concept graph behind a small read-only MCP server, scg, that exposes the public crosswalk spine (OCSF, D3FEND, ATT&CK, NIST 800-53, CCI) and tags every one of its 7,618 edges with a proxy_quality running from measured down to the intent-blind artifact_cooccurrence inferences that make up its largest class, so a multi-hop answer carries the trust of its weakest edge and a path that leans on one of those cheap offense-to-defense joins gets flagged rather than buried inside the answer. I want to be clear about what that does and does not buy, because the same lab run found conceptual grounding roughly inert and the structure earned only the one query, so the server is not making the model more correct; what it adds is honest provenance on how each link is supported, which is the part a buyer can audit.

The infrastructure tax

Agent-scale access is a cost problem before it is a model problem.

When a security team puts an AI copilot or an agent into the defender's workflow, the consequence most programs underweight is the load it puts on the data layer. An agent and a copilot issue one to two orders of magnitude more queries than a human analyst, and the SIEM and pipeline underneath were sized for the human shape, so the binding constraint on an AI-defense strategy often turns out to be query economics rather than model quality. I treat this as Tier B, because it rests on practitioner and vendor framing of the load shift rather than on an audited benchmark, but the direction is consistent across the rebuild vendors and it matches what I have seen when an automated workflow starts hammering a pipeline that was provisioned for dashboard refreshes. If your plan is a copilot tier driving an existing SIEM, the question to answer first is what the query bill looks like when the agent layer hits the data as hard as a refresh loop never did.

The tax is describable, not just a warning. It shows up in concurrency, because an engine that returns a dashboard in thirty seconds under light load behaves differently when thousands of agent requests a minute arrive in bursts, and the cost of holding sub-second latency under that concurrency is the first line item. It shows up in scan economics, because compressed JSON sitting in object storage is cheap to keep and expensive to query, and an agent that follows references and expands its own context issues far more of those expensive scans than a human ever would, so the storage layout that was fine for occasional hunting becomes the dominant cost under agent access. And it shows up in the query layer itself, because the difference between an engine built for human dashboards and one built for machine access patterns is the difference between paying for a hundredfold query increase and falling over under it. None of this is exotic; it is the same columnar-versus-row and streaming-versus-batch economics the data-engineering world already understands, carried into a security-data context where the SIEM pricing model was never built to absorb it.

There is a second split inside the agent-access question that gets flattened constantly, which is that "AI that maps and queries security data" is really two capabilities with different failure modes and different cost profiles. The deterministic-but-bounded path, a formal rewrite that is correct on what it covers, fails by refusing to answer outside its coverage; the flexible-but-fallible path, a model composing the query, fails by answering wrongly while looking right, and the BIRD execution-accuracy numbers put a measurable gap between them. I treat that gap at depth in the LLM-OCSF-mapping piece rather than re-explaining it, but it belongs in the tax discussion because the two paths price differently: the bounded path spends its cost up front on the ontology and the rewrite engine and then runs cheaply and safely, while the fallible path runs cheaply until a silent error costs an investigation, which is a cost that does not show up on the query bill. A vendor pitching "AI-native mapping" should be made to say which one it built, because the infrastructure tax is different for each.

The architectural question this forces is where in the workflow the AI assistance produces a measured outcome change, and what the data and query infrastructure looks like when the agent layer is hitting it at agent scale. In my experience the first question is rarely answered and the second is rarely even asked, so a procurement conversation about an AI-defense product that never reaches the query-economics question is evaluating the model and ignoring the layer that decides whether the model gets to run.

Order of operations

Pipe before policy, because the policy layer inherits whatever the data layer can feed it.

The same logic that governs the agent-access tax governs the policy and governance layer, and it is the clearest place to see why the data layer is the binding constraint. The DSPM, DLP, and AI-security categories converged hard through 2024–2025, with vendors promising sub-100ms inline policy enforcement, real-time AI classification, and end-to-end lineage, and those promises assume a data platform many organizations do not have: streaming telemetry at wire speed, schema normalized at query time, enrichment happening in-stream, and a policy engine deciding in under a hundred milliseconds. Run that policy layer on a pipeline that is actually "Logstash to S3 to a nightly Spark job to Parquet," and the latency floor lands closer to a day than to a hundred milliseconds, so the vendor is not lying so much as describing performance on infrastructure the customer does not run. I treat the full argument for this in the DSPM teardown, which stays the dedicated home for it, and the short version is that the policy claims are real and the pipe assumptions under them are usually implicit, so the gap surfaces in the post-purchase implementation review where it can no longer be papered over.

The build order that survives contact with the policy-layer marketing runs the other way from how programs usually sequence the spend. Streaming infrastructure first, so telemetry flows continuously rather than on a schedule. Schema normalization at ingest next, mapping vendor formats to OCSF or ECS before they hit storage rather than figuring out the schema downstream. A metadata and lineage layer after that, with the lineage captured automatically rather than written into a Confluence page that is stale by the time anyone queries it. Columnar storage for the analytical workloads the policy platform will issue against it. And only then the DSPM, DLP, or AI-security platform on top, running on a data layer that can actually support the latency and discovery patterns it advertises. The reason the sequence matters is that the same data platform that lets a policy product hit its sub-100ms claims also carries threat hunting, detection engineering, and the agent-access workloads from the previous section, so the unglamorous infrastructure spend is what makes the exciting policy spend deliver.

The cost allocation is where I am most willing to put a number on it, with the appropriate caution about where the number comes from. In the estates I have worked, a rough ballpark is that the pipe layer absorbs somewhere around 30–50% of total program cost and the policy layer the remaining 50–70%, and that is an estimate from experience rather than a measured figure across a controlled sample, so I would not defend the exact split. What I would defend is the pattern around it: the programs that allocated only 10–15% to the pipe layer tended to underperform on the policy platform's headline claims, because the policy platform inherited a data layer that could not feed it, and no amount of policy-engine quality compensates for a pipeline that delivers the data a day late and unnormalized. The market is currently resolving this through M&A rather than through honest customer-facing positioning, with CrowdStrike acquiring Onum, SentinelOne acquiring Observo AI, and Palo Alto Networks acquiring Chronosphere, so the pipe-first capability is being folded into the policy-first vendors by acquisition, which tells you the gap is real even where the marketing does not name it.

The two questions worth putting to a DSPM or DLP vendor before signing anything are what specific streaming and schema-normalization assumptions the performance claims depend on, and how those assumptions map against the infrastructure the buyer actually runs today. A vendor that can answer concretely is one whose deployment tends to survive the first six months in production; a vendor that deflects is deferring the conversation to the implementation review, which is the most expensive place to discover that the pipes were never there.

Why now

The infrastructure vendors started saying their own platforms need a redesign.

The reason this is worth tracking now rather than treating as a perennial topic is the source of the signal. A few years ago, the AI story at the data layer was vendors adding LLM chatbots to dashboards, which was incremental and left the infrastructure underneath unchanged. Through 2024, agent frameworks like LangChain and CrewAI demonstrated agents issuing hundreds of tool calls per task, and production engineers started noticing what an agent that does not sleep does to a SIEM sized for human analysts. By late 2024 into 2025, the recognition that traditional infrastructure could not carry agent query patterns surfaced as internal R&D direction rather than marketing.

Then, across an eight-week window in late 2025, the data-infrastructure vendors themselves started saying their platforms needed a fundamental redesign for agent-scale access: Confluent and Databricks on streaming into the lakehouse, Databricks again on MCP Catalog, Cribl on agentic telemetry, Tenzir on AI-generated integrations. These are not GenAI application companies whose business is selling models; they are the people who built the data infrastructure, telling their own customers that the patterns the infrastructure was built on have run out of road. When the layer that has the most to lose from declaring its own architecture inadequate says so anyway, and several competitors say it within two months of each other, the convergence is the part worth taking seriously, more than any single vendor's headline number.

What this means in procurement

Evaluate the data layer, not the AI feature.

The questions nearly every vendor RFP response is already optimized to answer are the wrong ones for this. Does the platform have an AI-powered dashboard, can you ask questions in natural language, is there automated anomaly detection: those test whether the AI feature is present, not whether the data layer underneath can carry an agent-scale workload. The sharper questions go at the infrastructure directly. Can the platform hold its query latency under thousands of agent requests a minute rather than under a hundred human users? What is the query cost when an agent issues a hundredfold more queries than the analyst it assists? Can a model generate integrations end to end, or do humans still configure them by hand? Does it speak MCP, or only REST APIs designed for human consumption? What is the path from real-time streams into the analytical platform, given that an agent acting in real time needs it?

The benchmarks have to move with the questions. Time-to-insight for a human analyst, dashboard load time, and concurrency for a hundred human users are BI benchmarks, and a vendor making AI-native claims who can only show those is making a BI claim with a relabeled headline. The agent-scale equivalents are query latency under tens of thousands of agent requests a minute, ingest throughput for real-time action workloads, integration generation time for a new source, and end-to-end streaming-to-action latency. And the deterministic-versus- generative split from the tax section is a question in its own right, because a vendor whose machine-query layer lets a model compose queries freely is carrying the silent-error tax whether or not the demo shows it.

On timing, I would separate the buy from the bet. A shop deploying in 2025 is usually right to stay on proven AI-augmented patterns, because the rebuilt infrastructure is not hardened yet and the rebuild vendors are still shipping intent ahead of audited results. A shop designing for 2026–2027 should plan around the agent-scale prerequisites, real-time streams, machine-access query optimization, MCP integration, and the pipe-before-policy build order, so that the data layer is in place before the agent workloads land on it. Greenfield environments in 2025 are the right place to pilot the new patterns in non-production while the category matures, which keeps the learning going without betting production on infrastructure that has not been measured.

Hypothesis status

H-ARCH-12 · what's known, what isn't, what would change the answer.

The full claim is that security-data infrastructure is shifting from a human-centric BI shape toward an agent-scale shape, and that the shift requires changes at the data layer (query engine for machine access, integration automation, streaming into the lakehouse, unified context) that the policy and governance layer on top depends on but cannot supply. Current confidence is 3 out of 5, which is enough to track and act on at the margin but not enough to treat as settled.

What supports it. Several independent data-infrastructure vendors making convergent rebuild moves inside an eight-week window, at the CEO level rather than in product-marketing copy, with a shared diagnosis (legacy infrastructure inadequate for agent load) and divergent treatments (query optimization, integration automation, streaming-to-lakehouse). The "BI to AI" inflection framing from someone with structural credibility in streaming infrastructure. And the pipe-before-policy pattern showing up independently in the DSPM/DLP procurement surface, where the same data-layer gap surfaces from a different direction.

What's missing. Production deployments with published metrics, and independent reproduction of the headline performance numbers, because Cribl's 10×, Tenzir's 100+ Gbps, and the hands-off mapping-accuracy claims are all vendor figures with no audited benchmark behind them yet. Practitioner adoption data beyond early adopters. A measured cost split between pipe and policy across a controlled sample rather than the experience-based 30–50% ballpark I am working from. The evidence base is Tier C-D today, vendor positioning with convergence as a corroborating signal, not measured outcomes.

What would change the answer. Production deployments of agent-scale telemetry with published query-volume and latency numbers. Independent benchmarks of AI-generated OCSF-mapping accuracy on real EDR or cloud logs. Adoption data showing whether buyers actually prefer rebuilt data infrastructure at the procurement layer or keep choosing AI-augmented BI tools. And a unifying open standard at the policy-platform layer analogous to OCSF at the schema layer, which is currently absent and is structurally what would let pipe-first infrastructure decouple cleanly from any single policy vendor. Active testing of the Tenzir MCP server is on the lab roadmap, and results land here when the work is done.

Why this matters for the program. This research surface is one of the evidence triggers that gates the MLOps-hunting service line. Until the hypothesis moves from Tier C to Tier B, with at least one production reference carrying measured outcomes, the practice ships thought-leadership rather than a commercial AI-hunting service, so the two pages move together: when the evidence here matures, the service-line scoping conversation becomes operationally grounded rather than speculative.

The hypothesis updates as the evidence does.

The DSPM teardown carries the full pipe-before-policy argument, and the LLM-OCSF-mapping piece carries the deterministic-versus-generative query trade with its BIRD numbers. The research page holds the other anchor hypotheses and the contradictions log; the thesis page connects them to the program POV.

Back to research → See the lab roadmap