Technology deep-dive

Pipeline lock-in: where switching costs moved next.

For most of the last decade the binding constraint in security data architecture was storage lock-in: Splunk's proprietary index, Elastic's Lucene files, $150/GB ingest pricing. Apache Iceberg and Delta Lake largely resolved that, but the switching costs didn't disappear so much as migrate up the stack to the pipeline layer, where vendors now monetize routing rules, transformation logic, and enrichment graphs written in proprietary DSLs. The market signal is real, because in the 18 months from early 2024 through mid-2025 every major platform vendor either acquired, built, or bid on a pipeline capability. The lock-in rhetoric is overheated, though the architectural concern underneath it is not.

Reading time: about 16 minutes. Evidence tier: B overall (M&A coverage from named press sources, vendor pricing pages, practitioner interviews from migration engagements). Specific case-study numbers are anonymized and rounded from engagements I've worked or directly observed.

The reframe

Lock-in is a switching cost, not a trap.

Before going further, a calibration. The popular framing ("vendor trap," "no escape," "data hostage") overstates the situation in a way that makes it harder to reason about, so I prefer the boring version: every pipeline tool creates a switching cost proportional to how much of your logic lives in its proprietary configuration language, and that cost is sometimes worth paying and sometimes not. The useful question isn't whether you're locked in but whether the switching cost is priced into how you negotiate renewals and design new flows.

With that framing, the story of the last five years becomes legible. Storage lock-in had a concrete number: reindex two petabytes of Splunk data, that's months of work and a seven-figure invoice. The open table formats made that number much smaller. Pipeline lock-in has a less obvious number, because the switching cost is buried in five thousand lines of Cribl Stream routes, or six hundred Vector Remap Language transforms, or a Tenzir pipeline graph that someone's senior engineer wrote in 2023 and isn't around to explain.

Three things have changed at once: vendors moved monetization upstream because storage commoditized, M&A consolidated the field, and the pipeline tier is now where most architects are spending incremental budget. That combination is why the topic suddenly carries weight, even though it isn't new lock-in so much as the same lock-in pattern showing up in a layer that used to be too small to fight over.

The pattern

How lock-in migrated up the stack.

Phase one: storage was the moat (2015–2020)

The dominant SIEM economics were storage-anchored. Splunk priced on data ingested, indexed in a proprietary format that nothing else could read efficiently. Elastic's Lucene indexes had similar properties (readable by Elasticsearch, painfully slow elsewhere). The vendor message was unambiguous: your data is our moat. Switching meant reindexing the dataset, which for any organization past a few hundred terabytes was months of engineering work and a multimillion-dollar parallel-running window.

The customer pain was the per-gigabyte markup. S3 Standard at $0.023/GB versus Splunk ingest at $150–200/GB is a five-thousand-times spread on the raw storage line. The realistic full-stack markup, once you account for compute, retention, and feature gating, lands in the 100–230× range across the deployments I've audited. Either way, the gap was visible enough that an alternative would eventually emerge.

Phase two: open table formats commoditized storage (2020–2023)

Netflix open-sourced Iceberg in 2020. Databricks open-sourced Delta the year before. By 2023 the practical question "can I query my security data with Spark or Trino or DuckDB or Snowflake without rewriting it" had a yes answer. Storage stopped being the moat. The data became portable across query engines, and the per-gigabyte spread between proprietary indexes and Parquet-on-S3 made the migration math obvious for any team willing to do the work.

The vendor response was not to fight that but to concede storage and monetize the layer above, the part that turns raw events into normalized, enriched, routed, and tagged streams. Cribl, founded in 2018, was the cleanest expression of that strategy, because Splunk's pricing pressure had created a market for anything that could reduce ingest volume, and Cribl's positioning ("optimize what flows into your SIEM") captured that market.

Phase three: the pipeline tier became the new spending bucket (2023–2026)

By 2025, pipeline tooling had become a meaningful share of security data spend. Cribl reportedly crossed $200M ARR at end-2024 and $300M (announced February 2026) per Sacra, which is faster than Splunk reached the same numbers from a standing start, and that pace is a signal that pipeline control monetizes faster than storage did. Datadog Observability Pipelines, Vector (acquired by Datadog in 2021), Tenzir, Fluentd, Fluent Bit, and Edge Delta all compete for the same architecture decision.

The lock-in surface lives in five places, and naming them concretely matters more than the general argument:

Routing rules. "Send AWS logs to S3, Azure logs to ADLS, on-prem syslog to Kafka, low-severity to cold tier." Five thousand of these in a mature deployment.
Transformation logic. Parsing nested JSON, flattening arrays, normalizing timestamps, extracting fields from raw text. Vendor-specific DSL.
Enrichment pipelines. GeoIP, threat-intel joins, user-context lookups, asset-inventory lookups. Often vendor-specific lookup table formats.
Schema mappings. Source format to OCSF, source format to ECS, source format to a custom internal schema.
Filtering and sampling. Drop debug logs, sample INFO at 10%, keep all ERRORs, drop sources tagged as test environments.

All of that lives in proprietary configuration. Cribl Stream uses its own pipeline-function DSL plus JavaScript snippets. Vector uses TOML plus Vector Remap Language. Fluentd uses Ruby-flavored config plus custom plugins. Tenzir uses its own pipeline syntax. Datadog Observability Pipelines uses a hosted UI that exports to a vendor-specific format. None of these are portable to each other in any meaningful sense.

Market signal

Follow the deals, not the headline number.

In the 18 months from early 2024 through mid-2025, every major platform vendor either acquired, built, or bid on a pipeline capability. Cisco-Splunk ($28B, closed March 2024) anchors the upper bound; if you exclude Splunk and count only narrow pipeline plays, named deals come to the low single- digit billions. Either way, the strategic signal is the same: the layer matters enough that platform vendors are spending real money on it. SaaS multiples have since compressed materially (Redpoint's 2026 data puts the sector at 4.1× forward revenue, a decade low), so any aggregate quoted at 2024 peaks would price meaningfully lower today.

The pattern of deals is the part worth paying attention to:

CrowdStrike acquired Flow Security in March 2024. Headline figure was reported at $200M, but CrowdStrike's SEC 10-Q disclosed roughly $96.4M in cash consideration. The strategic reading is straightforward: CrowdStrike already owns the EDR agent and the LogScale SIEM, and Flow Security closes the pipeline tier in the middle.
Cribl raised a $319M Series E in August 2024 at a $3.5B valuation, against reported ARR of roughly $200M at year-end 2024 and $300M (announced February 2026) per Sacra. That growth rate is the cleanest signal in the sector that pipeline control monetizes faster than storage did at the same stage.
SentinelOne acquired Observo AI for $225M, announced September 8, 2025. Same playbook as CrowdStrike-Flow, executed about 18 months later and at a higher disclosed price; the strategic value of owning the pipeline tier had not compressed even as broader SaaS multiples did.
Cisco-Cribl acquisition rumors circulated through 2024 at roughly $2.5B, an aggressive multiple on Cribl's then ~$200M ARR. The deal didn't close on those terms; Cribl raised the Series E and stayed independent. The fact that a 12.5× revenue multiple was the rumored price tells you what the strategic value of the layer was at peak.
Datadog continued to expand Observability Pipelines (the productized Vector roadmap) as a paid SKU.
Palo Alto Networks and Microsoft built proprietary pipeline capabilities into Cortex and Sentinel respectively, rather than acquiring; the same lock-in logic, executed via build rather than buy.

The pattern across all of these is that the platform vendors (EDR, SIEM, observability) are not willing to let an independent pipeline tier sit between their products. Either they buy the layer, build it, or partner aggressively to neutralize it. For a security architect, the practical implication is that "best of breed pipeline tool plus best of breed SIEM" is a moving target. Your pipeline vendor may be acquired by your SIEM's competitor, or by your SIEM directly, and the integration story may change overnight.

What it actually costs

Three patterns I've seen in real engagements.

The numbers below are anonymized and rounded from engagements I've either worked or directly observed in peer review. Treat them as Tier B: directionally calibrated, not audited financial reporting.

Pattern one: the mid-market Cribl curve

A mid-market financial services firm adopted Cribl in 2022 to reduce Splunk ingest. Year one was a clean ROI story: roughly $200K in Cribl license, against an $840K Splunk ingest reduction. Year two the cost shape changed. Licensing grew to about $350K as volume grew. Two engineers ended up dedicated to Cribl operations, adding $300K in loaded labor. Total cost of ownership cleared $650K.

Year three the renewal landed at $500K-plus, a roughly 40% increase. The team did the switching-cost math: rewriting around 500 routes in Vector would have run six months of two engineers, plus three months of parallel validation, plus a data-loss risk during cutover that nobody wanted to underwrite. Total switching cost in the low-to-mid six figures, plus operational risk. They renewed.

Reading: the lock-in did its job, because switching was technically possible but more expensive than accepting the price increase, and the vendor knew that. This isn't a horror story so much as how the switching-cost math is supposed to work when there's no portability layer.

Pattern two: the multi-pipeline sprawl

An enterprise tried the "best of breed per source" approach: Cribl for cloud sources, Fluentd for Kubernetes, Telegraf for metrics, Vector for on-premise, Kafka for streaming integrations. License costs were modest at around $800K because most of the stack was open source, but the engineering costs were not, since five FTEs spread across the configurations ran roughly $1M loaded, plus ongoing training and onboarding overhead of about $50K a year.

Total operational cost cleared $1.85M annually. The hidden cost was cognitive: five different configuration languages, five different debugging stories, five different failure modes. The team estimated about 60% of their time went to pipeline maintenance, leaving 40% for security work. That's a problematic ratio when the team's nominal mandate is detection engineering.

Reading: vendor diversity is sometimes a feature and sometimes a cost, so the "no single vendor can hold us hostage" argument has to be weighed against the per-tool operational tax, and for most mid-sized teams the tax is larger than the lock-in risk it mitigates.

Pattern three: the consolidation migration

Same enterprise as pattern two, two years later. The decision was to consolidate on one pipeline tool (Cribl, on grounds of maturity). The migration ran 18 months, consumed about $2.5M in consulting, engineering, and parallel-validation cost, and surfaced three problems that didn't show up in the pre-migration analysis. Cribl Cloud had a four-hour outage that produced a data-loss window because local buffering wasn't configured. A point release broke roughly 30% of pipelines because of a syntax change that wasn't documented as breaking. And the pricing model changed mid-contract to bill on peak usage rather than committed usage, which retroactively raised the run-rate.

Reading: consolidating on one pipeline vendor maximizes operational efficiency and maximizes lock-in exposure at the same time, so the right answer is usually neither "spread across five tools" nor "consolidate on one" but "consolidate on one while designing for portability," which is what the next section works through.

Why it's harder than storage

Logic is harder to migrate than data.

The structural reason pipeline switching costs are higher than storage switching costs, even at equal data volume, is that data migration is mostly mechanical and logic migration is mostly semantic. Migrating from Splunk to Iceberg is fundamentally a loop: export day by day, convert format, write Parquet. The schema gets carried across, the values get carried across, the correctness check is byte-level comparison.

Migrating from Cribl to Vector is not that shape. A Cribl pipeline that parses a vendor's nested JSON, applies regex extraction to a free-text field, joins against a GeoIP table, and routes based on the result of all three has to be rewritten in VRL with semantic equivalence tested against real data, not just structurally translated. The translation is roughly 80% mechanical and 20% judgment, and the 20% is where data-quality bugs live.

Three secondary reasons compound the primary one:

Testing pipeline changes risks data loss. Storage migration can run with read-from-both validation. Pipeline migration usually means running parallel pipelines, which doubles the operational surface, or cutting over with rollback, which risks gaps in the detection record.
Vendor-specific features create soft lock-in. Cribl's adaptive load balancing, Datadog's ML-based log pattern detection, Tenzir's pipeline graph optimizer: these may have no open-source equivalent. Migrating off doesn't just rewrite the logic; it gives up the optimization.
Skills lock-in compounds tooling lock-in. A team that's spent two years on Cribl has internalized Cribl's quirks. Retraining onto Vector is three to six months of productivity loss. This is real cost, and it doesn't show up on a license line.

My rough rule of thumb from migration engagements is that equivalent-volume pipeline migration runs roughly ten times the engineering effort of equivalent-volume storage migration, and that ratio is what separates "we can switch" from "we can switch cheaply enough that the vendor has to compete on renewal."

Escape routes

Four design moves that keep the switching cost bounded.

None of these eliminate lock-in, but the honest framing is that they reduce the switching cost from "rewrite everything" to "rewrite the adapter," and that reduction is enough to recover negotiating room at renewal.

1. Standardize on OCSF at the pipeline output

This is the single highest-impact move, because you can let the pipeline tool handle vendor-specific parsing (that's its differentiated value) while enforcing that the output schema is OCSF (Open Cybersecurity Schema Framework), and once the output is portable the downstream lakehouse, SIEM, or detection engine is portable too. The migration story for the pipeline tool becomes "produce the same OCSF output from the new tool," which is a much smaller scope than "produce equivalent output in an unspecified shape."

This isn't a hypothetical. Most modern pipeline tools (Cribl, Vector, Tenzir, Datadog Observability Pipelines) ship OCSF mapping libraries or templates. The work is enforcing the discipline: every pipeline emits OCSF-validated output before it hits the destination, no exceptions for "we'll normalize later."

I ran the claim down on my own reference stack to check that "swap the router, keep the contract" is a real property and not just an aspiration. On 2026-06-07, on a single host, I routed the same raw Okta event to OCSF Authentication through three independent routers in turn (Vector, Tenzir, and Fluent Bit), and all three emitted the identical OCSF Authentication contract: class_uid 3002, the same activity_id for sign-in and sign-out, and matching user, src_ip, and the rest of the populated fields. That is the OCSF-at-output claim made concrete, because the routing tier changed underneath while everything downstream of it saw the same event in the same shape. The honest framing is that this is a contract-equality check on one host, a portability proof rather than a throughput number, so it tells you the schema discipline holds across routers, not how fast any of them runs at volume. I checked the storage half the same way: routing OCSF answers through three catalog implementations (iceberg-rest, Nessie, and Lakekeeper) returned identical answers, and the same query returned identical answers across Iceberg and DuckLake on one object store, so the portability holds at the table layer as well as the routing layer.

2. Keep routing logic in code, not in vendor config

Pipeline tools want you to express routing rules in their UI or their proprietary YAML. That's convenient, and it's exactly where the lock-in accumulates. The alternative is to express routing rules in your code repository (Python, TypeScript, whatever your team writes) and generate the vendor-specific config from that source of truth.

# router.py — your code, source of truth, version-controlled
class SecurityRouter:
    def route(self, event):
        if event.source == "aws_cloudtrail":
            return Destination.S3_SECURITY_LAKE
        if event.source == "azure_activity":
            return Destination.ADLS_SECURITY_LAKE
        if event.severity >= 8:
            return Destination.HOT_TIER
        return Destination.COLD_TIER

# cribl_emit.py — disposable adapter
def to_cribl_routes(router): ...

# vector_emit.py — alternative adapter
def to_vector_routes(router): ...

The intent is captured in code that survives a vendor switch, and the vendor-specific config becomes a generated artifact, regenerated on every CI run. Replacing Cribl with Vector then becomes "write a new emitter," which is a small project, rather than "rewrite five thousand routes," which is not.

3. Require config export and version control from day one

There are two non-negotiables for any pipeline tool you adopt. First, every configuration must be exportable programmatically: Cribl supports this via API and CLI, Vector is natively file-based, Tenzir is file-based, and Datadog Observability Pipelines exports through API. If the vendor's answer is "you can export from the UI as JSON," that's the floor rather than the ceiling. Second, every configuration lives in Git, with intent documented separately from syntax.

The reason is that you cannot migrate what you cannot read, and you cannot read five thousand routes that nobody documented, which is why configuration sprawl is the single biggest accelerator of lock-in I've seen across engagements. By the time anyone notices, half the routes are owned by people who left.

4. Prefer open protocols at the boundaries

At the source side and the destination side, prefer protocols you can switch tools behind: syslog (RFC 5424), Kafka, S3 with Parquet/Iceberg output, OpenTelemetry Protocol (OTLP). Avoid configurations where the pipeline tool is the only thing that can read its own buffered output, or where the input protocol is vendor-specific in a way that ties your sources to one pipeline tool.

The OpenTelemetry Collector is worth tracking specifically here. It is a CNCF-graduated, vendor-neutral pipeline architecture (receivers, processors, exporters) with broad industry support. For observability use cases it's already a credible alternative to the commercial pipeline tools. For security-specific OCSF flows it's not yet at parity, but the trajectory is real and the standards body is the right one. If a true "OpenPipeline" standard emerges in the next 24 months, it'll come from this corner of the ecosystem.

Decision framework

When the switching cost is worth paying.

The question is not whether to use a commercial pipeline tool but how to underwrite the switching cost honestly. Three conditions where I think paid pipeline tooling is the right call:

The cost differential is large. Ingest-reduction savings exceed $1M/year, the pipeline license is a fraction of that, and the three-year ROI clears even under a worst-case lock-in scenario.
The vendor passes basic durability checks. Profitable or well-funded with at least five years of operating history, a hundred-plus enterprise customers, and a roadmap that includes standards work rather than just feature differentiation.
The escape route exists. Configuration is programmatically exportable, standard protocols are supported on both sides, and a credible migration path to at least one alternative is documented.

Three conditions where I'd walk away:

Proprietary everything. Custom query language, proprietary data format, cloud-only deployment, no export. Each of these is a yellow flag; the combination is a red one.
Unsustainable pricing. Annual increases above 30%, retroactive billing changes, peak-usage penalties without consumption controls. These signal that the vendor expects to extract on renewal, which is the lock-in scenario the previous section is designed to prevent.
Technical requirements violated. No local buffering (data-loss exposure on vendor outages), no on-prem option where compliance demands it, no multi-region support, no SLA. These are non-negotiable for security data, where gaps in the record are gaps in the detection.

The shape of a healthy decision is "we know what the switching cost looks like, we have the design patterns that bound it, and the cost differential makes the trade worth it." The shape of an unhealthy decision is "we've adopted the tool because it solved the immediate ingest pain and we'll worry about lock-in later." Later usually arrives during renewal, in a weak negotiating position.

2026 market context

Consolidation pressure cuts both ways.

The macro backdrop for 2026 pipeline decisions is worth naming. SaaS is down roughly 20% year to date, the worst-performing S&P 500 sector. Forward revenue multiples have compressed to about 4.1×, a decade low. Enterprise vendor consolidation is a stated goal at roughly half the organizations in recent practitioner surveys, even though practitioners themselves rank cost optimization near the bottom of their priority lists. The gap between organizational priority and practitioner priority is a separate problem, but it's the gap that's driving the consolidation pressure.

That backdrop affects pipeline decisions in two directions at once. Consolidation may make lock-in feel worse, because when one vendor owns your EDR, your pipeline, and your SIEM the integration cost is architectural rather than just contractual, and the exit stops being "switch a tool" and becomes "switch an ecosystem." On the other hand, market pressure may force interoperability as a competitive differentiator. OpenTelemetry adoption accelerated in observability precisely because vendors needed to reduce integration cost to stay competitive at lower multiples. The same dynamic is plausible for security pipelines, on a longer timeline.

The practical implication for an architect making a pipeline decision in 2026: the urgency to design for portability has gone up, not down. Distressed vendors get acquired faster and integrated more aggressively at compressed multiples. If you don't already have a portability strategy when the next consolidation cycle hits your specific vendor, you don't get to pick the escape route; the acquirer's integration roadmap picks it for you.

Where this fits

If you're already feeling the pinch.

The architects I talk to who are running into this concretely usually share a few characteristics. Pipeline tooling is a meaningful and growing share of their security data budget, and renewal conversations have started to feel less negotiable than they used to, so there's an internal sense that the team owns five thousand routes nobody can fully explain. And the next architecture decision (adopt a lakehouse, change SIEMs, add a new detection platform) keeps colliding with the pipeline tier in ways that weren't on the original plan.

That set of symptoms is what the migration-assessment engagement is designed to address. The work is bounded: an inventory of what the pipeline is actually doing today, a portability assessment against the four design moves above, a switching-cost estimate calibrated to your specific configuration, and a renewal-position analysis you can take into the next vendor conversation. It isn't a recommendation to switch tools but a framework for deciding whether to switch, what switching would cost, and what to do in the meantime if you're not switching.

The companion piece on this site, the hidden cost of SIEM migration, goes into the downstream-of-pipeline part of the same question: what it actually costs to move the SIEM itself once you've worked out the pipeline tier. If you're considering both moves, those two pieces are the pair I'd read together. If you want to talk about whether an assessment makes sense for your specific situation, the migration assessment engagement page has the scope and the booking link.

Closing

The pipeline tier is where lock-in lives now.

Storage lock-in mostly got solved by Apache Iceberg, Delta Lake, and the broader open-table-format movement, but the switching costs that used to live at the storage layer didn't disappear so much as migrate up the stack, into the routing and transformation logic that turns raw events into the shape your SIEM and your lakehouse can use. That logic now lives in proprietary DSLs (Cribl Stream, Vector Remap Language, Fluentd config, Tenzir pipelines, Datadog Observability Pipelines), and the switching cost between them is higher than the storage migration was, because logic is harder to translate than data.

The good news is that the design patterns to bound that switching cost are known, so you standardize on OCSF at the pipeline output, keep routing logic in code rather than vendor config, require config export and version control on day one, and prefer open protocols at the boundaries. None of these eliminates the switching cost, but together they reduce it from "rewrite everything" to "rewrite the adapter," which is what lets you negotiate from a position of strength instead of accepting whatever the renewal looks like.

The bad news is that pipeline consolidation is accelerating rather than slowing, because the 2024 M&A spree happened at peak multiples while the 2026 spree will happen at compressed multiples, which means faster and more aggressive integrations. If you're spending materially on pipeline tooling and you don't have a portability strategy in place, the window for designing one rather than reacting to one is narrower than it was a year ago, and that narrowing window is the actual urgency here, even though the "vendor trap" framing around it is overheated and the underlying architectural concern is not.