Technology deep-dive
Push vs pull query engines for security analytics.
Two query engines receive the same SQL against a billion security events, and one returns in five seconds while the other returns in fifty, even though both are "fast OLAP databases," both use columnar storage, and both claim vectorized execution. The difference is invisible in marketing materials but baked into the engine architecture, which is whether execution is push-based or pull-based, and for scan-heavy, filter-aggregate security workloads that choice may matter more than any other engine-level decision.
Reading time: about 18 minutes. Evidence tier: B overall (academic literature for the Volcano model and DuckDB's morsel-driven design, production implementations for ClickHouse and Velox, vendor framing elsewhere). The push-pull distinction is well-established in database research. The performance gap numbers are directional; actual results depend on hardware, data shape, and query patterns, and I flag the load-bearing claims throughout.
The hidden performance gap
Two engines, same query, ten-times difference.
Start with the query a SOC analyst runs ten times a day:
SELECT src_endpoint.ip, COUNT(*)
FROM security_events
WHERE severity >= 4 AND time > NOW() - INTERVAL '1 hour'
GROUP BY src_endpoint.ip; A billion rows sit in the scan window, the query filters on severity, groups by source IP, and counts, which is the shape of most threat-hunting and detection-engineering queries: heavy projection on a few columns, a selective predicate on time and severity, a small aggregation key, and no complex join, no recursive CTE, no window function. It's the unglamorous core of security analytics, and it dominates SOC compute because so much of the day-to-day work reduces to exactly this pattern.
On a Volcano-model pull engine (the architecture Trino, Presto, and pre-2.0 Spark share with decades of database history) that query may take fifty seconds against a cold cache on commodity hardware. On a morsel-driven push engine like DuckDB or ClickHouse, the same query against the same data may finish in five seconds, even though it's the same SQL against the same Parquet files and the same Iceberg metadata, so the ten-times difference is attributable in significant part to how the engine moves data between operators internally.
The architectural distinction goes by several names in the literature: iterator-based versus data-centric, demand-driven versus producer-driven, pull versus push. They all describe the same decision: whether the consumer at the top of the operator tree asks its child for one more row, or the source at the bottom hands its parent a batch of thousands of rows and lets the work flow upward. The choice shapes function-call overhead, CPU cache behavior, vectorization opportunities, and parallel scheduling, so for analytical workloads on columnar data push tends to win, while for federation across heterogeneous sources pull is still the better fit, and both of those reasons matter for security teams designing a data stack.
The Volcano model
How pull-based execution works.
The Volcano model, named for Goetz Graefe's 1994 paper that formalized it, is the architecture most
databases used for the better part of three decades. Every operator in the query plan implements the
same iterator interface, typically open(), next(), and close().
The root operator at the top of the plan calls next() on its child. That child calls
next() on its own child. The recursion unwinds at the table scan, which reads one row
from disk and returns it up the chain. Each operator gets one row at a time, decides what to do with
it, and passes it (or doesn't) to its parent.
The simplified shape of a Volcano filter operator looks like this:
class FilterOperator:
def next(self):
while True:
row = self.child.next() # pull one row up from below
if row is None:
return None
if self.predicate(row): # evaluate predicate on that single row
return row # hand it up to the parent
The model has real strengths. It's demand-driven, so a LIMIT 10 query stops pulling rows
the moment ten have matched; no wasted work scanning past the limit. It's clean to implement; every
operator looks the same from the outside. It's easy to compose; you can add a new operator type and
slot it into existing plans without touching the framework. Postgres, SQLite, pre-2.0 Spark, MySQL,
and Trino all started life as Volcano-style executors.
The cost shows up at scale. For a query that scans a billion rows, the pull model makes roughly a
billion function calls just to move data through the pipeline: one next() per row per
operator. Each call has overhead the CPU pays whether the operator does meaningful work or not:
virtual function dispatch, stack frame setup, branch prediction misses when the call target is
polymorphic, register spills, and none of that overhead computes anything useful, so once you
multiply it by a billion the per-row tax dominates the runtime for cheap operators like filters and
projections.
The other cost is harder to see in the code but louder in the profiler: the Volcano model is hostile to CPU cache lines and vectorized instructions. When the engine touches one row at a time, the column values it cares about may live in cache lines that get evicted before the next row arrives. SIMD (Single Instruction, Multiple Data) instructions, which can apply the same operation to many values in parallel, have nothing to chew on if each operator only sees one value at a time, so the hardware capability is sitting there while the execution model has no way to reach it.
Modern Volcano implementations partially mitigate this with vectorization: passing batches of rows through the iterator instead of single rows, sometimes called the "vectorized Volcano" or "block iterator" variant. Trino took this path. The batch-at-a-time iterator narrows the gap but doesn't close it, because the operator tree is still organized around the pull contract. The source still doesn't know what the consumer needs until it's asked.
The push model
Inverting the data flow.
Push-based execution inverts the control flow. Instead of the consumer asking the producer for the
next row, the producer hands the consumer a batch and tells it to deal with it. The table scan at
the bottom of the plan reads a chunk of data (often a column buffer of several thousand values)
and calls a consume() or push() method on its parent operator. That
parent processes the batch, possibly filtering or transforming it, and pushes the result up to its
own parent. The data flows from source to sink, batched all the way.
The shape of a push-based filter operator looks like this:
class FilterOperator:
def consume(self, batch):
mask = self.predicate.evaluate(batch) # vectorized over the whole batch
filtered = batch.filter(mask) # zero-copy filter on column buffers
if len(filtered) > 0:
self.parent.consume(filtered) # push survivors up Three things change at once. First, the function-call count drops by roughly three orders of magnitude. A batch size of around a thousand means a billion rows become a million pushes, so function dispatch overhead stops dominating the profile. Second, the predicate evaluation becomes vectorized. The filter operator hands a column buffer to the SIMD unit, which evaluates the predicate on many values per CPU instruction. Third, cache behavior improves dramatically. The column values stay in L1 or L2 cache as the operator chews through the batch, instead of getting evicted and reloaded for every row.
The push model also fits streaming naturally. A stream of new security events arriving from Kafka is producer-driven by definition: the events keep coming whether the consumer is ready or not. A push-based engine just keeps consuming batches as they arrive. A pull-based engine has to model the stream as a special kind of iterator and deal with the impedance mismatch. Apache Flink, which is push-based from the ground up, sits in the right shape for streaming security analytics. So does Falcon LogScale (formerly Humio), whose architecture was designed around push-based brute-force scanning of log data.
Push isn't free, though, because the producer-driven model loses some of the lazy evaluation a pull
model gets for free, so a LIMIT query in a pure push model has to push batches until the
consumer signals "enough," which is harder to short-circuit cleanly. Modern push engines handle this
through backpressure signals or hybrid approaches, but the implementation is fiddlier than the elegant
Volcano iterator, and that simplicity is part of why the pull model lasted so long even as the
throughput of the push model is what wins it the OLAP corner now.
Morsel-driven parallelism
DuckDB, Velox, and the next generation.
The modern refinement of push-based execution is morsel-driven parallelism, formalized in a 2014 SIGMOD paper from the HyPer group at TUM (the Technical University of Munich) and adopted in DuckDB, Velox (the execution engine Meta built and open-sourced, now used by Presto-on-Velox and Spark Gluten), and several research engines. The idea: instead of statically partitioning work among threads at plan time, break the input into small chunks called morsels (typically around 100,000 rows, sized to fit comfortably in L2 cache) and let worker threads pull morsels off a shared queue as they finish their previous one.
Each morsel flows through the operator pipeline as a self-contained unit. Thread A grabs morsel 1, runs it through scan, filter, and partial aggregate. Thread B grabs morsel 2 in parallel and does the same. Thread C grabs morsel 3. When all morsels are done, the partial aggregates merge into a final result. The morsels are small enough that no thread sits idle waiting for a giant batch; the queue is work-stealing, so faster cores naturally pick up more morsels than slower ones; and the data fits in cache for the duration of one morsel's processing.
The result, for the kinds of filter-aggregate queries that dominate security analytics, is close to full CPU utilization across all available cores. DuckDB benchmarks routinely show 90%-plus CPU utilization on TPC-H queries; the Volcano model, even with vectorization, tends to plateau at much lower utilization because of synchronization between operator stages. The structural argument is the same one Apache Arrow makes at the memory-layout level: if you arrange the data and the work so the CPU can stay busy with vectorized operations on cache-resident buffers, you get throughput that single-row iterators leave on the table.
Velox deserves a separate note because it represents the strongest signal that morsel-driven push execution is winning at the high end. Meta built Velox as a unified execution engine to underpin Presto, Spark, and PyTorch at internal scale. The Presto-on-Velox effort effectively retrofits a push-based vectorized engine into the project that started as the canonical pull-based federation engine. The honest version of "Trino vs DuckDB on scan-heavy workloads" in 2026 is increasingly "Trino vs Presto-on-Velox," and the Velox path takes the push-execution architectural lessons seriously.
ClickHouse approaches the same destination from a slightly different angle. Its execution model isn't strictly morsel-driven in the HyPer sense, but it's deeply push-based; its pipelines stream column buffers through vectorized operators, with dedicated threads per pipeline stage. The practical effect is similar: vectorized batch processing, high CPU utilization, no per-row function call tax. For security teams running petabyte-scale log analytics, ClickHouse's push architecture is a significant part of why it sustains the throughput it does at the price points it does.
Why push fits security workloads
Scan-heavy, filter-aggregate, streaming.
Security analytics has a workload shape that lines up almost too neatly with push-based execution. Three properties dominate.
Scan-heavy time-window queries. The canonical SOC query asks for everything matching some predicate over the last hour, day, or week. That's a large scan with a selective filter, exactly where push execution shines because the filter can be vectorized over column buffers and the surviving rows can flow up the pipeline as a tight cache-resident stream. The pull model pays a per-row function-call tax across the whole scan window; the push model amortizes that cost across batches.
Simple aggregation patterns. Security queries usually group by a small number of dimensions (source IP, user, host, alert type) and count, sum, or rank. The aggregation hash table stays small enough to fit in cache, partial aggregates per morsel merge cheaply at the end, and the whole pipeline runs hot. Complex multi-way joins where pull-based optimization shines are rare in the detection and threat-hunting paths I see; they show up more in compliance reporting, which tolerates higher latency.
Streaming ingestion. Security telemetry arrives continuously. Kafka pushes events to consumers. The natural fit on the consumer side is a push-based engine that processes batches as they land: Flink for streaming CEP (complex event processing), Falcon LogScale for log analytics at scale, ClickHouse materialized views for incremental rollups. A pull-based engine has to model the stream as a special kind of source and pretends to ask for the next row, which is the wrong shape for the underlying physics.
A fourth property matters for detection engineering specifically. Real-time detection rules need to evaluate against every event, not against a snapshot every few minutes. That's the territory where DBSP (the database stream processor formalism from academic research, with early implementations in Feldera and Materialize) takes push execution further still: represent the detection rule as an incremental dataflow, and update the result with each event rather than rerunning the whole query at intervals. For a rule like "alert when a user has more than 100 failed logins in the last hour," DBSP-style incremental push can move detection from minute-latency down to millisecond-latency, which is why I treat that pattern as the leading edge: promising, with early production deployments, but not yet routine.
The fit isn't universal, because forensic investigation queries that join across many tables and walk relationships look more like classical OLAP, where Trino's pull model handles them perfectly well, and compliance reports that scan a year of data to produce a small summary table tolerate higher latency. But the operational SOC workload (alert triage, threat hunting, real-time correlation) sits squarely in push territory.
Where pull wins
Federation is the case Trino was built for.
Push isn't the right answer everywhere, and security architecture diagrams that wave away Trino are oversimplifying. Federation (joining data from heterogeneous sources in a single SQL query) is the case where pull-based execution still wins, and it's a case that matters in security operations.
A federation query might pull recent alerts from a ClickHouse cluster, join them against an asset
inventory in PostgreSQL, enrich with threat intelligence from a Snowflake table, and cross-reference
with cloud audit logs sitting in S3 as Iceberg. Trino was designed for exactly this workload. Each
source has its own connector, its own dialect, its own data shape. Trino pulls from each, performs
the join in its own coordinator and workers, and returns a unified result. The pull-based iterator
model adapts well here because every source can be wrapped as an iterator that responds to
next(); the engine doesn't need the source to be push-friendly, only to be
SQL-addressable.
Push-based engines struggle with federation precisely because the source needs to be cooperating: it needs to hand the engine column buffers, ideally in Arrow format, ideally with predicate pushdown, which works fine for Iceberg-on-S3 or for talking to another columnar engine but works poorly for "go ask QRadar's REST API for some events" or "join this against a row in MongoDB." Trino's connector ecosystem (over 25 connectors across SQL databases, NoSQL stores, message brokers, and cloud APIs) is a real asset that push engines haven't replicated and probably shouldn't try to.
The architectural pattern I tend to recommend for security teams running heterogeneous data is engine specialization: a push engine for the scan-heavy operational workload (ClickHouse or DuckDB against the lakehouse), and Trino for the federation cases when an analyst needs to join across sources that don't all live in the same columnar store. That's a "pick the right tool" answer rather than a "one engine to rule them all" answer, and the cross-link on the reference architecture pages discusses how to wire those tools together at Trino federation.
Two further notes. First, Presto-on-Velox is pushing Trino-style federation toward push-based execution at the leaf level: the federation control flow stays pull, but the data plane inside each worker becomes push. That hybrid may eat into the "push vs pull is the architectural choice" framing over the next couple of years; I'm watching it. Second, Trino's strength in federation depends heavily on predicate pushdown: the engine asking the source to filter as much as possible before returning data. When sources don't pushdown well, Trino pulls everything and filters centrally, and the workload starts to feel slow regardless of which model is underneath, so federation stays hard either way, and pull execution doesn't make it easy so much as it makes it possible at all.
Hybrid execution
Most modern engines blend both.
The push-versus-pull framing is useful as an architectural distinction, but production engines are rarely pure on either side. Several patterns blend the two:
Planning pulls, execution pushes. Plan generation in most modern optimizers is demand-driven: the planner pulls cost estimates from each operator to decide on a join order or a partitioning strategy. Once the plan is compiled, the runtime hands it off to a push-based executor that streams batches through the pipeline. DuckDB and Velox both work this way, so the planner gets the lazy semantics it needs to reason about the query while the executor gets the throughput.
Apache Arrow Acero. Acero is the streaming execution engine that's part of the Apache Arrow project: explicitly push-based, vectorized, designed to consume and produce Arrow buffers. It powers parts of Arrow Datasets and shows up in pandas, Polars, and DataFusion-adjacent stacks. I mention it not because it's a dominant production engine for security data yet, but because it's the canonical reference implementation of "what a pure push engine looks like in the Arrow ecosystem," and it's worth knowing about when reading source code or evaluating new tools.
Pipeline breaks. Some operators can't pass data straight through without materializing first: hash joins build a hash table on one side before probing with the other, sort operators have to see everything before they can return the first row, aggregations may need to spill to disk on large groupings. These pipeline-breaker operators force a transition from streaming push to materialize-and-push-the-next-stage. Engines that handle pipeline breaks gracefully (with cache-aware blocking, spilling, and partial-aggregate combining) tend to outperform engines that don't. This is one place ClickHouse's mature spill-to-disk and partial-aggregate combining show up in production.
Vectorized iterators. Trino's current execution model is best described as batched pull or vectorized Volcano: operators still implement an iterator contract, but each call returns a batch (a "page" in Trino terminology) rather than a single row. It's not the same as push-based execution, but it captures most of the per-row function-call savings while keeping the pull semantics that make federation tractable. The Presto-on-Velox path goes further by replacing the data plane underneath the pull-based control plane.
Engine landscape
What's running which model in 2026.
The honest version of the engine map, with the caveat that "push" and "pull" are simplifications and the line is blurry for hybrids:
- ClickHouse: push-based, vectorized. Pipeline-per-stage threading. Built for real-time OLAP and log analytics. The case study I keep coming back to for petabyte-scale security log analytics is in ClickHouse at 5 PB/day.
- DuckDB: morsel-driven push, embedded analytical engine. The right tool for
single-node threat hunting against a local cache or against Iceberg via the
duckdb-icebergextension. More on the SOC use case in DuckDB for threat hunting. - Velox / Presto-on-Velox: push-based execution engine underneath what was originally Trino-family code. Meta built it, open-sourced it, and the broader ecosystem (Spark Gluten, Presto-on-Velox) is in active adoption. For security teams running Presto today, this is the upgrade path.
- Trino: vectorized pull (batched Volcano). The federation engine of choice. Best fit for joining heterogeneous sources, not the right primary engine for petabyte log scans.
- Apache Spark (3.x and later): vectorized execution via Photon (in Databricks) or Apache DataFusion-style add-ons (Gluten with Velox). The pure-iterator model from pre-2.0 Spark is long gone in production deployments.
- Apache Flink: push-based from the ground up. The dominant choice for streaming CEP and real-time detection rule evaluation. Pairs naturally with Kafka as the event source.
- Falcon LogScale (formerly Humio): push-based, designed for brute-force log scanning without traditional indexing. Owned by CrowdStrike now. Vendor claims of 30–40x faster search than indexed approaches sit in Tier C territory (vendor marketing, not independently validated), but the underlying architecture is real push execution.
- Apache Arrow Acero: the reference push-based engine in the Arrow project. Worth knowing as the architectural example, not yet a production choice for security workloads.
- DataFusion: Rust-based push execution engine in the Arrow ecosystem. Powers parts of InfluxDB and Cube.dev. Growing footprint; worth tracking as a candidate for custom-built security analytics engines.
- PostgreSQL: Volcano pull, with some batched optimizations. Still the right tool for the transactional-row workloads it was built for; not the right tool for petabyte analytical scans.
The pattern that stands out across this list: every engine optimizing for analytical throughput at scale has either started life push-based (ClickHouse, DuckDB, Flink) or migrated toward push-execution data planes underneath original pull architectures (Spark with Photon, Presto with Velox). Pure pull execution is still appropriate where federation or row-oriented OLTP dominates, but the trend line for OLAP is clear.
Honest gaps
What the architecture story doesn't tell you.
A few honest caveats I'd put in front of any architect making a real engine selection on the basis of push-vs-pull arguments:
The 10x performance gap is workload-dependent. The "five seconds versus fifty seconds" framing at the top of this essay is real for the canonical filter-aggregate scan-heavy query against a billion rows. It's not a universal multiplier. Wide joins, complex window functions, queries that hit cold storage with high latency: the gap narrows, and for joins I've now measured how far. On the lab's scored join bench (Tier B, single host, 10M–60M-row tables over shared Iceberg) every engine answered the SOC-shaped join suite in under 1.5 seconds, and the join overhead relative to each engine's own flat scan sat near 1.8× for StarRocks and both ClickHouse arms, with Trino the outlier at 4.35× — single-host numbers, so they say nothing about the TB-scale regime behind the headline framing. Some queries run comparably on Trino and ClickHouse because the bottleneck is somewhere else entirely (network bandwidth, S3 list-operation latency, hash-table sizing). Benchmark your own workload before building an architecture decision around a directional claim.
Operational maturity isn't the same as architectural fit. ClickHouse's push architecture is a real advantage for log analytics, but the operational story (cluster management, replica coordination, schema migration) takes investment to run well. Trino's architectural disadvantage on scan-heavy workloads is partially offset by deeper operational tooling and a longer track record at large enterprise scale, so the architecturally cleaner choice isn't always the one that's easier to run once it's in production.
Security-specific benchmarks are scarce. Most of the published push-vs-pull performance numbers come from TPC-H, TPC-DS, or vendor-curated synthetic benchmarks. OCSF-shaped security event data with realistic cardinality, wide schemas, and time-window-heavy predicates isn't well-represented in public benchmark suites. The directional argument from the literature is sound; the precise improvement number on your firewall logs or EDR telemetry is something you'd have to measure, so I went and measured a first slice of it on the MOAR reference stack.
The bench runs three of the scan-heavy filter-aggregate shapes this essay is about over a
1,000,000-row OCSF network_activity table, taking the median of four trials per query,
and the answers were checked equal across the engines before any latency was trusted.
| Query shape | DuckDB | ClickHouse | Trino |
|---|---|---|---|
count(*) full scan | about 2.4 ms | 18.2 ms | 68.5 ms |
needle predicate (dst_port=3389) | 5.7 ms | 22.1 ms | 97.5 ms |
GROUP BY dst_port | 12.1 ms | 30.1 ms | 96.6 ms |
These are single-host figures at one million rows, so read the relative pattern rather than the absolute milliseconds, and the relative pattern holds up: ClickHouse's morsel-driven push beats Trino's Volcano-style pull by roughly three to four times on these workloads (4.4x on the needle, 3.2x on the group-by, 3.8x on the count), and DuckDB's embedded vectorized engine, with no coordinator and no network hop, is far ahead of both at this single-host scale.
So the honest number is closer to three or four times than to the ten times the canonical billion-row framing implies, and the gap I measured is smaller because this is a single-host snapshot at a million rows rather than a billion-row scan against a cold cache. What this slice actually isolates is the execution-model difference at one point on the curve, because Trino's pull-and-coordinate model carries distribution overhead that earns its place at federation and at large scale, not at small single-host batches, which is exactly the case the engine wasn't built for and exactly the nuance the rest of this essay keeps insisting on. The lab will widen this out to larger row counts and a real multi-node Trino deployment, where I'd expect the coordinator overhead to amortize and the gap to read differently.
So I reran the same three shapes at a hundred times the volume, a 100,000,000-row OCSF
network_activity table on the same single host, again the median of four trials, and
the picture rearranges in a way that's more interesting than a uniform scaling factor. StarRocks
joined this run, so the field is four engines. On the count(*) full scan ClickHouse
finished first at about 10.5 ms, with DuckDB just behind at 12.4 ms and StarRocks and Trino back at
48.2 and 44.4 ms, which is the throughput-bound scan where the push model's pipeline parallelism
finally pays off enough to overtake the embedded engine that swept everything at a million rows. On
the needle predicate (dst_port=3389) DuckDB still led at 77.7 ms ahead of StarRocks at
95.0, ClickHouse at 182.4, and Trino at 419.2, and on the GROUP BY dst_port DuckDB led
again at 103.1 ahead of StarRocks at 194.4, ClickHouse at 229.1, and Trino at 668.9. Trino stayed
slowest on every shape, which is what a single-host run does to a coordinator-and-distribution
engine that was built for neither single host nor small batches.
The crossover is the part I'd flag for an architect, because at a million rows the embedded DuckDB was unambiguously fastest on all three shapes, and at a hundred million that advantage turns out to be scale-bounded: ClickHouse's morsel-and-push parallelism caught and passed DuckDB on the full-scan count, where raw throughput is the whole game, while DuckDB held its lead on the selective needle and the group-by, where there's less to parallelize and the embedded engine's lack of a coordinator hop keeps winning. The push-versus-pull and embedded-versus-server patterns this essay is built around are not fixed multipliers but functions of scale and query shape, so the right takeaway from these two runs together is that the engine that wins flips as the row count climbs and as the query moves from throughput-bound to selective. These remain single-host figures, so the relative ordering is the finding and not the absolute milliseconds, and the multi-node Trino run is still the test that would let the pull model show what it was actually designed for.
The single-query latencies above hold one client steady, so to see what happens when a SOC actually
loads an engine I ran a concurrency sweep on the same single host (the lab's
concurrency_sweep.py), pointing C clients at the same scan-aggregate over a shared
10,000,000-row OCSF table and watching throughput and tail latency as C climbed from 1 to 16. The
two push-and-scheduler engines turned added clients into aggregate throughput: ClickHouse's
morsel-driven model rose from about 20 to 58 queries a second with the gentlest tail (p95 from 59
to 355 ms), and StarRocks climbed from 16 to 42 q/s before plateauing around eight clients (p95 out
to 476 ms). Embedded DuckDB held flat at roughly 46 q/s with its p95 stretching from 57 to 689 ms,
which is what one process on a fixed core budget does once the clients outnumber the cores it has to
spread them across. Trino was the one that struggled under load, sitting flat and low at 9 to 13 q/s
while its p95 ran from 128 ms all the way to 1,736 ms, by far the worst tail in the field, because
on a single host the coordinator-and-worker overhead dominates and there is no cluster to spread the
concurrent load over.
That is the measured version of the point this essay has been making about where Trino belongs: under concurrency on one box the push and scheduler engines convert each new client into throughput while Trino pays its distribution tax with no nodes to amortize it over, which is the same scope caveat the single-query benches carried, only now visible in the shape of the throughput and tail curves rather than in a single latency. The honest boundary is the same too, since this is still one host, and the multi-node cluster concurrency that Trino's coordinator model was actually designed to win remains a test I'd have to run on real cluster hardware before reading anything into it.
Federation will keep mattering. The clean "push for OLAP, pull for federation" line works as a first approximation, but security operations regularly need both. A SOC team that standardizes on ClickHouse for the operational data plane and Trino for federation queries is running two engines, with the operational complexity that implies, and the cleaner alternative of moving everything into one engine usually means giving up either federation reach or scan throughput, so the decision comes down to which of those trades you're willing to accept. The worked scorecard is where those trades get weighed per environment archetype.
The hybrid future is messier than the architectural framing. Presto-on-Velox, Spark with Photon, and Trino's continued vectorization investments mean the "Volcano vs morsel" line is becoming less sharp at the production edge. The architectural distinction will still matter for understanding why an engine performs the way it does, but vendor selection in 2027 may look less like "pick push or pick pull" and more like "pick which hybrid the vendor has invested in." I'd treat the framing in this essay as a lens for reading new engines, not as a two-bucket sort.
Conclusion
Architecture is destiny, with footnotes.
The choice between push-based and pull-based execution isn't visible in marketing materials, it doesn't show up on a feature checklist, and it doesn't make the slide deck, but it shapes how the engine handles the workload that matters most to a SOC: scan-heavy, filter-aggregate queries on columnar event data, often with streaming ingestion behind them, and for that workload push tends to win, sometimes by an order of magnitude.
The practical guidance is straightforward. For real-time detection and operational SOC workloads, push-based engines (ClickHouse, DuckDB, Flink, Falcon LogScale) are the architecturally appropriate choice, while for federation across heterogeneous sources Trino's pull model is still the right fit and probably will be for some time. For streaming detection, push is the architecture that fits the data flow without contortion, and for ad-hoc threat hunting against a local cache, DuckDB's morsel-driven push execution is hard to beat on a single node.
The lazy answer is "use whatever the SIEM vendor ships." The architecturally informed answer is to match the engine to the workload: push for OLAP, pull for federation, hybrid where the vendor has actually invested in both. Most security teams will end up running two engines for two workload shapes, which is fine, and the mistake to avoid is forcing one engine to cover both and accepting the worst of each.
The next time a benchmark claims a ten-times speedup on the same SQL, the same hardware, and the same data, the architecture story is probably part of the answer. Knowing which part means knowing whether the engine is asking its source for one more row, or being handed a batch and told to deal with it. That distinction is older than most of the security workloads it now serves, and it's still the right place to start a serious engine evaluation.