Technology deep-dive

Vector: the data router Datadog open-sourced.

Vector is the Rust-written, single-binary data router that powers the most credible Tier B open-source-only security data pipeline reference I can cite: Huntress's 3M-endpoint, 16B-events-per-day ClickHouse stack. It is also the open core under Datadog's commercial Observability Pipelines product. Understanding what Vector is, what it isn't, and where it stops short of Cribl is the central question for anyone trying to build a route-by-value pipeline without a commercial-platform license.

Reading time: about 20 minutes. Evidence tier: mixed. The Huntress production reference is Tier B (ClickHouse-published case study with named numbers). Vector's architectural and language details are Tier B (Vector documentation plus practitioner deployments). The Vector-versus-Cribl trade-offs draw on Tier C practitioner experience. Vector's exact governance status (CNCF-aligned, Datadog-stewarded, not formally a CNCF project as of May 2026) is flagged where it matters.

What it is

A Rust binary, a topology graph, and a transform language.

Vector started life inside Timber.io. Datadog acquired Timber in 2021 and adopted Vector as the engine for what became Datadog Observability Pipelines, the company's commercial managed-pipeline product. Vector itself stayed Apache 2.0 open source under the vectordotdev/vector GitHub organization, with Datadog's Community Open Source Engineering team taking maintainer responsibility. The project's governance documentation targets CNCF alignment (silver-level Linux Foundation Core Infrastructure Initiative practices), but Vector is not itself a formal CNCF project as of May 2026, despite the cloud-native framing it sometimes attracts. In terms of stewardship, Datadog runs it in the open and is the entity with the engineering headcount that meaningfully ships features, which is a difference that matters when you compare it to Fluentd or Logstash, both of which sit inside genuine foundation governance.

The technical shape is straightforward and is the reason teams adopt it. Vector is a single, statically compiled Rust binary, so there is no JVM, no plugin sprawl, and no agent-versus-aggregator product split, because the one binary handles edge collection on a server or a Kubernetes node and then the same binary scales out as a centralized aggregator. Configuration is TOML, YAML, or JSON; the deployed topology is a directed graph of three node types: sources, transforms, and sinks. Sources pull data in (file tails, syslog, the Datadog Agent, journald, AWS S3, Kafka, the OpenTelemetry receiver, dozens more). Transforms reshape that data. Sinks push to destinations (S3, ClickHouse, Loki, Kafka, Splunk HEC, Elasticsearch, the Datadog API, and a long tail of others).

The transform layer is where Vector earns its name. Transforms compose into pipelines, and any non-trivial transform is written in Vector Remap Language (VRL), an expression-oriented language designed for safe, performant event reshaping. VRL compiles to Rust at config-load time. It has observability-specific built-ins (parsing key-value pairs, extracting JSON, working with timestamps and IPs), a type system that fails configuration loads at validation time rather than runtime, and an execution model that runs in the same process as the routing engine. The performance benefit is real and quantifiable: VRL operates inside Vector's Rust hot path, so a typical enrichment pipeline costs tens of microseconds per event rather than the milliseconds a Logstash filter chain or a Fluentd Ruby plugin would burn.

Architecture

Sources, transforms, sinks, and why the topology graph matters.

The mental model I find most useful: Vector is a streaming-event router whose configuration is a graph spec. You declare nodes, name them, and reference them by ID in the inputs field of downstream nodes. The runtime builds an internal DAG, opens the sources, fans events through transforms in declared order, and writes to sinks. Fanout, conditional routing, and multi-destination patterns are all expressed by referencing the same upstream node from multiple downstream nodes, so there is no separate "router" abstraction in the way Cribl has Routes, because in Vector the graph itself is what does the routing.

# Vector: a minimal route-by-value security pipeline
[sources.edr]
type = "kafka"
bootstrap_servers = "kafka-edr:9092"
topics = ["edr.events"]

[transforms.parse_and_classify]
type = "remap"
inputs = ["edr.events"]
source = '''
  . = parse_json!(.message)
  .severity = if includes(["execution","persistence","credential_access"], .technique) {
    "high"
  } else {
    "low"
  }
'''

[transforms.high_value]
type = "filter"
inputs = ["parse_and_classify"]
condition = '.severity == "high"'

[transforms.bulk_sample]
type = "sample"
inputs = ["parse_and_classify"]
rate = 10                    # keep 1 of every 10 low-value events

[sinks.siem]
type = "splunk_hec"
inputs = ["high_value"]
endpoint = "https://hec.splunk.internal"

[sinks.lake]
type = "aws_s3"
inputs = ["bulk_sample"]
bucket = "sec-lake-raw"
encoding.codec = "parquet"

That eighty-line config is the entire shape of a two-tier routing pipeline, where the high-severity branch goes to the SIEM while the low-severity branch is sampled at 10% and written to S3 as Parquet for later lakehouse querying, with no control plane, no proprietary pipeline language, and no managed-service contract anywhere in it. This is what people mean when they say Vector is the open-source path to route-by-value, because the pattern is expressible, the runtime is fast, and the operational footprint is one binary per node plus a config file.

The deployment topology splits into two operational roles, both running the same binary. Agents run on the source host (a Linux server, a Kubernetes pod, a Windows endpoint), do light parsing, and forward to aggregators. Aggregators run centrally, do the expensive transforms (enrichment, sampling, deduplication, aggregation), and write to sinks. The split exists because some transforms (especially stateful ones like aggregations, deduplications, and threshold-based throttling) need a wider view of the event stream than any single agent has. Vector explicitly documents this agent-versus-aggregator pattern in its going-to-prod architecture guide, and most production deployments I've seen follow it.

Two operational properties of the runtime matter for security workloads. End-to-end acknowledgements let downstream sinks ack receipt back through the pipeline, so a Kafka source only commits an offset once the S3 or Splunk sink has confirmed delivery, so fewer dropped events under sink backpressure than a fire-and-forget forwarder. And the disk buffer option (per-sink, configurable) gives you durability through downstream outages, which is the property that distinguishes Vector from naive in-memory forwarders when a SIEM or lake goes down for an hour.

Production evidence

Huntress: 200K records per second into ClickHouse.

The single most defensible Tier B reference for Vector in security operations is Huntress. The numbers, drawn from the ClickHouse-published case study and Huntress engineering talks, are concrete enough to plan against. Huntress runs Vector as the ingestion layer for a managed EDR and SIEM platform serving roughly 3 million endpoints and 1 million identities. The pipeline ingests on the order of 16 billion events per day, with peak inserts pushing through Vector at 200,000 records per second into ClickHouse Cloud. The cost story is the one that gets quoted: a migration from a prior database stack to ClickHouse plus Vector cut monthly database spend from roughly $70K to $5K, a 90%+ reduction at scale.

The architectural pattern is worth pulling apart, because it generalizes. Vector sits between the EDR agents and the analytical store. Its job is to receive batches over HTTP, apply VRL transforms that shape events to fit ClickHouse table schemas, and insert via ClickHouse's HTTP interface. Vector's templating language is used to dynamically dispatch events into the right destination table, a structural advantage when you have dozens of source-specific tables and don't want one sink config per source. The pattern collapses what a less capable forwarder would force into either a fan-out of per-source sinks or a downstream routing layer.

What the Huntress reference does not prove is that Vector can do everything Cribl can, because the Huntress architecture is a two-tier pattern where Vector feeds an analytical lake (ClickHouse) and SIEM-style alerting happens inside that same analytical engine rather than through a separate hot-tier SIEM. That is a legitimate architecture, and it's the architecture I'd argue most security operations should be moving toward, but it sidesteps the classic Cribl problem of routing a single event stream to a $3-10/GB SIEM and a $0.023/GB lake simultaneously, with different schemas and different retention policies. Vector can do that pattern too (the config I sketched above does exactly that), but Huntress isn't the reference that proves it at Fortune-500 multi-tool scale.

Security-data fit

Where Vector lands in a lakehouse pipeline.

The architectural fit for security operations is the "raw to lake, alerts to SIEM" two-tier pattern. Raw events (EDR telemetry, Zeek logs, cloud audit trails, identity logs, application logs) land in object storage (S3, MinIO, Azure Blob) as Parquet, partitioned by source and event time, queryable through whatever lakehouse engine the security team prefers (ClickHouse, StarRocks, Trino, DuckDB). High-value signals (detections, alerts, enriched events with severity above a threshold) fork off to the SIEM hot tier. Vector's branching topology expresses this directly, and the per-sink disk buffer means the SIEM side can absorb a hot-tier outage without losing events.

Three properties make this pattern work well with Vector. First, the Parquet sink is first-party. Vector's aws_s3 sink supports a parquet codec that batches events into columnar files, which is the file format every modern lakehouse engine wants, so there is no external converter, no Spark job, and no intermediate format to maintain. Second, the Kafka source and sink are mature. If your architecture puts Kafka between collection and routing (a pattern I see in roughly half of large security deployments), Vector slots in on both sides of the bus. Third, the Datadog Agent source means Vector can ingest from any environment that already runs Datadog, which is a meaningful chunk of enterprise infrastructure. You can route Datadog telemetry to a security lake without buying additional Datadog product, which is an unusual property for an open-source tool to have.

The schema question is harder. Vector has no native OCSF normalization. You write VRL to map source events into OCSF fields, which is straightforward for simple cases (CloudTrail to OCSF Authentication, say) and tedious for complex ones. Cribl has Packs and Tenzir has OCSF as a first-class schema; Vector has VRL and a documentation expectation that you'll write the mapping yourself, which is friction for a team that wants OCSF as the canonical schema, though for a team that's already committed to a custom schema (Huntress, for instance, has its own ClickHouse table design) the absence barely registers.

What the VRL effort buys, though, is parity rather than a worse result, and I have a first-party check to ground that. On the MOAR reference stack I routed the same raw Okta event to OCSF Authentication three ways (Vector's hand-written VRL, Tenzir's native OCSF mapping, and Fluent Bit's Lua path, swapped in with ./moar swap-router on a single host), and all three produced the identical OCSF Authentication contract: class_uid 3002, activity_id 1 or 2, the user, the source IP. So the hand-written mapping lands exactly where the native one does, which means the OCSF friction in Vector is the cost of authoring the mapping, not a downgrade in what you end up with. I want to be careful about what that proves: it's a contract-equality check on one source (Okta Authentication) on one host, a portability proof rather than a throughput figure, and the Huntress numbers above are the practitioner-reported scale evidence, not this. Where I do have a first-party throughput floor is my own ingest bench (sdw-lab, single host, Tier B): Vector sustained about 26.1k events/sec, below the routers built for raw throughput — rsyslog at 93.7k and Tenzir at 89.6k — which fits Vector's design as a transform-and-route layer rather than a line-rate shipper, and is the kind of measured ceiling worth sizing against rather than a vendor's peak number.

The trade-off

Vector vs Cribl: open-source TCO against operator UX.

The honest Vector-versus-Cribl framing has little to do with whether open-source or commercial is better, because what actually separates them is that Vector and Cribl solve the same routing problem while making opposite trades on who absorbs the operational complexity. Cribl moves it into a vendor that you pay $0.10-0.30/GB to operate, whereas Vector moves it into your engineering team that you already pay regardless, so which trade is cheaper depends on your scale, your engineering depth, and how much you value the operator UX that Cribl does better.

Where Vector wins on substance: total cost of ownership. The Huntress economics scale: a 10 TB/day deployment self-managed on Vector is roughly $150-250K/year all-in (infrastructure, monitoring, engineer fractional time), versus $600-900K/year for the same workload on Cribl Stream Cloud. The Rust runtime uses less CPU and less RAM than Logstash or Fluentd for equivalent throughput, which is why Huntress can run 200K rec/sec at production-feasible hardware cost. The single-binary deployment is simpler to operate than Cribl's leader-plus-worker control plane, especially inside Kubernetes where a Vector DaemonSet plus a Vector aggregator Deployment is the entire operational surface. And there is no pipeline-language lock-in. VRL is portable in the sense that the runtime is Apache 2.0 and forkable, and the transformation logic is plain text that you own.

Where Cribl wins on substance is operator experience, because Cribl has a UI in which pipeline authors can drag, drop, preview events in real time, see the transformation result live, and ship a config without writing VRL. The Packs library covers a hundred-plus common log sources with vendor-tested parsing, and PII detection, sensitive-data masking, and routing-by-rule arrive as first-class UI features rather than VRL functions you write yourself, while the Routes abstraction lets a non-engineer author the "if-this-event-then-that-sink" rules without touching a config file. None of this is impossible to build on Vector (every Cribl feature has a VRL equivalent), but the operator burden of building it is real, and for a security team without dedicated data-engineering headcount, that burden is the constraint that decides the outcome.

Where Cribl also wins, and this matters more than people credit, is vendor-managed scaling, because when Cribl Stream falls over it is Cribl's on-call team that gets paged whereas when your Vector aggregator falls over it is your own engineer who gets paged. At 1-2 TB/day that's a small operational tax, but at 50 TB/day with stateful aggregations and per-source disk buffers it becomes a meaningful one. Cribl also has 100+ integrations against Vector's 40-50 sources and 50-60 sinks, and although the Vector ecosystem has caught up over the past three years, for the long-tail enterprise sources (old syslog dialects, vendor-specific telemetry formats, weird SaaS APIs) Cribl Packs still cover more ground than Vector's source list.

The decision rule I use: Vector wins outright at small-to-mid scale (under 5 TB/day) where engineering depth is available; Vector wins at large scale (10+ TB/day) where engineering depth is required and the TCO gap pays for the engineers; Cribl wins in the middle, with large enterprises that have regulatory operational maturity requirements but without the data engineering bench to operate a self-managed pipeline at production stakes. Both are capable tools, so the failure I see is teams picking between them on ideology when the choice should turn on operational reality.

Operational realities

The three things that bite at scale.

Three operational realities deserve naming, because they are the failure modes I have watched teams hit when they adopt Vector and underestimate the operational tax.

1. VRL has a learning curve, and the curve decides the outcome

VRL is well-designed. It is also, for someone whose mental model is shaped by Splunk SPL or Cribl's pipeline language, a different paradigm. VRL is expression-oriented and type-strict; a typo in a field path that would silently produce a null in SPL fails the config load in VRL. That's the right behavior, but for an engineer ramping in, the failure mode is "Vector refuses to start" rather than "Vector runs and quietly produces wrong data." The ramp time I've observed is roughly two to four weeks for an engineer to get fluent enough to author non-trivial transforms without help. For organizations where the same engineer also operates the pipeline at 3 a.m. on a Saturday, the ramp time is the thing that decides whether Vector adoption is a net win or a net loss for the team's operational throughput.

2. Stateful aggregations at scale are where pipelines get hard

Vector's stateful transforms (aggregate, reduce, dedupe, throttle) hold state in memory on the aggregator that runs them. State means the aggregator is no longer stateless, which means a restart drops in-flight windows, a node failure drops them permanently, and scaling out is no longer just "add another aggregator behind a load balancer." Vector does not have native distributed-state coordination; deployments that need it either accept the data loss, restart aggregators rarely and carefully, or push the aggregation problem downstream into the analytical engine (ClickHouse materialized views, in the Huntress pattern) where the state model is the database's problem. The Huntress architecture is a tell: aggregations happen in ClickHouse, not in Vector, and that is probably the right architecture for most security workloads regardless of which router you choose.

3. Observability gaps for SOC operators

Vector exposes detailed internal metrics (events in, events out, sink latency, buffer depth, error counts) through a Prometheus endpoint, and the metrics are good, but they are not turnkey for a SOC, because a SOC analyst who needs to answer "are EDR events from the Northeast region flowing into the SIEM right now?" wants a dashboard rather than a Prometheus exporter. Building that dashboard is a Grafana exercise, and the Vector documentation gives you the metric names but not the dashboards, whereas Cribl ships a monitoring UI that does this for you and Datadog Observability Pipelines does too, since that's the commercial wrapper Datadog sells on top of Vector. So self-managed Vector requires you to build the operator-facing observability layer yourself, and because that gap is wider than the gap in pipeline-authoring UX, it's the one to plan for early.

What's missing

Three gaps that matter for security lakehouses.

No first-party Iceberg sink

Vector writes Parquet files to S3, but it does not write Iceberg tables, and those are different things, because an Iceberg table is a Parquet dataset plus a metadata layer that tracks schema, partitions, snapshots, and ACID commits, so the write path has to participate in the metadata transaction. As of May 2026, the Vector roadmap and recent commits I can find do not include a native Iceberg sink. The community-PR status is murky enough that I would not plan a deployment on it landing in 2026. The workaround that nearly everyone falls back to is to write Parquet to S3 with Vector and run a separate Iceberg writer (a Spark job, a Flink job, or AWS Glue auto-loader) that picks up the Parquet files and commits them into an Iceberg table. It works, it adds latency (typically minutes), and it adds a second piece of infrastructure to operate. Compare against Cribl's roadmap (also not native, last I checked) and Tenzir's behavior (writes to Iceberg via embedded support) and you can see that the gap is structural across most of the routers rather than specific to Vector. But for a team that wants "raw events to Iceberg in one hop," Vector is not the tool that delivers that today.

No native OCSF normalization

Here the work falls to you in VRL, because OCSF mappings for the common log sources (CloudTrail, Okta, Windows Security events, Zeek) are not bad to author once you know VRL well, but they are not packaged for download the way Cribl's Packs cover this and Tenzir treats OCSF as its native schema. Vector instead treats any schema as a custom schema, which is the right choice for a general-purpose data router though the wrong one for a security team that wants OCSF as the canonical model with almost no parser-authoring work. If your architecture commits to OCSF, plan for an OCSF-mapping engineering effort or layer Vector behind a tool that does the normalization (Tenzir, for instance, in a hybrid pattern). If you're flexible on schema, Vector's custom-schema flexibility is fine.

No UI for non-engineering operators

Vector is configured in text files in Git. There is no web UI for authoring pipelines, browsing event samples, previewing transformations, or visualizing the running topology. That reads as a feature for engineering teams that want GitOps and configuration-as-code, but it becomes a deal-breaker for security teams where the pipeline authors are SOC analysts or compliance engineers who reasonably expect a UI, which is why Cribl Stream's UI is good enough to be the reason many security organizations choose Cribl even when the economics favor Vector. Datadog Observability Pipelines layers a commercial UI on top of the Vector runtime, which is one resolution to this gap if you are willing to pay Datadog for the wrapper. The third option (Vector plus a thin in-house UI for non-engineers) exists but I have rarely seen it built well, because building it well is essentially building a commercial product.

Governance

What "Datadog-stewarded" means.

Vector's license is Apache 2.0. The maintainer team is, in practice, Datadog's Community Open Source Engineering group, with significant external contribution but a clear gravitational center inside Datadog. Vector is not a CNCF project; the project's documentation describes targeting CNCF/CII alignment for governance practices, but it has not been formally accepted by the foundation at sandbox, incubating, or graduated levels as of May 2026. Verify against the CNCF projects page before quoting it in a procurement deck.

Datadog's commercial product, Observability Pipelines, is built on Vector and wraps it in a managed control plane plus UI plus enterprise support, but declining to buy it still gets you the same Vector binary with no feature-gating and no license-key behavior. That's a credible open-source posture, though one that depends on Datadog's continued investment, and Apache 2.0 means a fork is legally permitted, the binary runs standalone without phoning home, and the external contributor community is large enough that a Datadog withdrawal would not kill the project. Even so, treating Vector as "vendor-neutral" in the way Apache Iceberg is vendor-neutral overstates the case, because the honest framing is open-source software stewarded by a commercial vendor whose business model benefits from its widespread adoption, which is fine for teams comfortable with Linux or Postgres but is not the same thing as a CNCF-graduated foundation project.

Deployment shapes

Three patterns I see consistently.

Edge collection plus central aggregator

Vector DaemonSet on every Kubernetes node and a Vector Deployment as the central aggregator. Agents handle log file tailing, container log scraping, and node-level metric collection. The aggregator does enrichment, sampling, and routing to sinks. This is the default Kubernetes-native deployment shape and the one Vector's documentation steers new adopters toward. Typical hardware: m5.large agents on each node (often co-located with the workload), 3-5 m5.2xlarge aggregators behind a Kubernetes Service for a 5-10 TB/day workload.

Kafka-bridged two-tier

Vector agents publish to Kafka topics. A separate Vector aggregator consumes from Kafka and writes to sinks. Kafka in the middle gives you durability, replay, and the ability to add new consumers (a security lake consumer, a SIEM consumer, a compliance archive consumer) without changing the agent fleet. This is the pattern I most often recommend for organizations that already run Kafka, because the incremental operational complexity is small and the architectural flexibility is large. Huntress runs a variant of this pattern.

Cribl-edge, Vector-aggregate hybrid

A real-world pattern I've seen in two enterprise deployments: keep Cribl Edge for endpoint collection (the UI and integration coverage matter at the edge) and use Vector centrally as the aggregator (the cost matters at the volume). Both tools support sending events between each other over HTTP or Kafka. The pattern is operationally heavier than picking one or the other but is a legitimate transitional architecture for teams migrating from full-Cribl to full-Vector, or for teams that want Cribl's edge-collection breadth without paying Cribl rates on the central transformation pass.

Decision framework

When to choose Vector.

Choose Vector if

You have engineering depth on the team and are comfortable owning a Rust binary, a config-as-code workflow, and VRL as a pipeline language.
Open-source is a meaningful procurement preference, not just a nice-to-have.
Your scale sits where the TCO gap pays for the engineering: roughly under 2 TB/day where commercial alternatives are overkill, or above 10 TB/day where the gap is large enough to fund a dedicated engineer.
Your destination is an analytical lakehouse (ClickHouse, StarRocks, S3+Iceberg via post-processing) rather than a traditional schema-on-read SIEM.
You are flexible on schema or already committed to a custom one, so OCSF is not a hard architectural commitment.
You can build your own monitoring dashboards on top of Vector's Prometheus metrics.

Choose Cribl if

Operator UX is a primary constraint, since pipeline authors are SOC analysts, not engineers.
You need 100+ integrations out of the box, particularly long-tail enterprise sources.
OCSF normalization via vendor-tested Packs is a buying criterion.
You need 24/7 commercial support against a written SLA.
Regulated industry posture requires a vendor-on-the-hook accountability model.
Your scale is mid-range (2-10 TB/day) where the commercial cost is bearable and the engineering depth to self-manage is not available.

Consider Datadog Observability Pipelines if

You want Vector's runtime properties but Cribl's operational wrapper, and you're either already a Datadog customer or willing to become one. Observability Pipelines is the same Vector binary under a managed control plane with a UI and Datadog's support contract. The pricing details I can quote publicly are thin enough that I'd treat this as a quote-driven evaluation rather than a published-list-price one.

Conclusion

A real open-source path with real gaps.

Vector is the open-source data router I'd default to for a security pipeline being built greenfield today, with two qualifiers. The first qualifier is engineering depth. Vector pays back the engineering investment that Cribl absorbs into a vendor contract, and a team without that investment available will struggle with VRL, with stateful-aggregation operations, and with building the SOC-facing observability layer that doesn't ship in the box. The second qualifier is the schema and Iceberg story. Vector writes Parquet, not Iceberg, and writes whatever schema you author rather than OCSF natively. Both gaps are bridgeable, but bridging them is engineering work that an architect should plan for explicitly.

The Huntress reference is the production proof point I keep coming back to, with three million endpoints, 16 billion events per day, 200K records-per-second peak ingest, and a 90% cost reduction versus the prior stack. That is a real-world Tier B reference for an open-source router doing route-by-value at security scale, and it is the strongest case I can make that the Vector path is a working alternative to Cribl rather than a hypothetical one. It works, but it also looks like Huntress, which is a team with engineering depth and a custom-schema architecture, so since most security teams are not Huntress, the pattern that works there requires deliberate work to apply elsewhere.

My take is that Vector earns the default-open-source-router slot in any 2026 security data pipeline platform conversation, while Cribl earns the default-commercial-router slot in that same conversation and Tenzir earns the default-pipeline-detection slot when in-stream detection is a first-class requirement. The picks aren't competing for the same job so much as for the same architectural slot under different constraints, which is why the slot matters more than the brand, and Vector's slot is "open-source route and transform, with engineering-team ownership of the operational surface," so if that's the slot you're filling, Vector is the tool I'd reach for.