Technology deep-dive
DuckDB for analyst-driven threat hunting.
The threat hunter's workflow is exploratory rather than operational, because a session typically runs twenty to fifty queries in thirty minutes and each one is shaped by what the last one returned. An embedded analytical database (one that runs inside the analyst's Python process rather than on a remote cluster) fits that workflow in a way a traditional client-server warehouse may not, and DuckDB is the engine that has earned the slot. Jake Thomas's team at Okta showed the pattern at 7.5 trillion records of scale, and the same engine sits on individual analyst laptops doing ad-hoc work against the same S3 lake.
Reading time: about 17 minutes. Evidence tier: A for the Okta-scale numbers (Jake Thomas's Data Council 2024 talk, documented analysis by Julien Hurault). B for the broader analyst-workflow framing, drawn from DuckDB project documentation, community practitioner accounts, and reference architectures rather than peer-reviewed benchmarks. The performance ranges quoted are directional and depend heavily on file layout, column selectivity, and S3 region; treat them as a starting hypothesis to test against your own workload.
Two Okta stories, one engine
This is the analyst angle, not the Lambda angle.
There are two distinct stories about DuckDB at Okta, and conflating them obscures both. The first is the serverless transform pattern: DuckDB inside AWS Lambda doing normalization, deduplication, and hashing across 7.5 trillion records and 130 million files over six months, replacing Snowflake compute for the preprocessing tier. I've written about that pattern as a teardown of its own (the link is at the bottom of this essay), and it's really an infrastructure story about cost and elasticity.
The second story is the one I want to tell here, because the same Okta team that runs serverless DuckDB for preprocessing also uses embedded DuckDB on analyst laptops and notebook servers for ad-hoc threat hunting against the same Iceberg lake. The infrastructure story explains why the data is there, while the analyst story explains why DuckDB earns the slot in the analyst's hand once the data exists, and so it's one engine being put to two different uses.
The analyst angle is the one I find under-told, because vendors and conference talks gravitate toward the Lambda story since it has a dollar figure attached (the often-quoted ninety percent cost reduction on the preprocessing tier), and the notebook story is harder to put in a slide. I'd argue it's the structurally more important shift, though, because a cost win happens once whereas a workflow shift compounds for the lifetime of the SOC.
The shape of a hunt
Threat hunters explore, they don't operate.
A scheduled SIEM rule and a threat hunt look superficially similar (both involve SQL or SQL-like queries against event data) but they're different workloads. The scheduled rule runs the same query on a cadence against pre-aggregated data, and sub-second latency matters because it gates alerting. A hunt is the opposite shape: the analyst doesn't know the right query yet, the hypothesis evolves as results come back, and thirty seconds per query is fine as long as the next query can be formulated and submitted immediately.
A typical hunting session I'd recognize from my own work and from practitioner accounts runs roughly like this. The analyst starts with a hypothesis ("are there failed logins from this IP range that cluster on weekends?") and runs a filter-and-group query against the last week of authentication events. The result shows three subnets with brute-force-shaped patterns. The next query pivots: "which users were targeted by these three subnets, and what's the time window?" The third query crosses engines: "do EDR events from the same window show process activity on those users' endpoints?" Twenty to fifty queries deep, the analyst has either confirmed an active campaign, ruled out the hypothesis, or generated three new hypotheses worth chasing.
Every property of that workflow conflicts with the assumptions baked into a traditional client-server analytical warehouse. The queries are unpredictable, so pre-warming a cache doesn't help. The result sets are small (a few hundred rows usually) but the inputs are huge (billions of events scanned). The analyst owns the iteration speed, so a thirty-second network round-trip is fine but a three-day "submit a ticket to the data engineering team to load this source" is not. Most consequentially, each query is a one-off and won't run a thousand times tomorrow, so whatever caching the warehouse does is mostly wasted on hunting.
What the analyst actually wants is SQL against whatever Parquet or Iceberg data exists, with no warmup, no ticket queue, no warehouse cost per query, and ideally no separate UI from the notebook where the rest of their analysis already lives, which is the slot DuckDB fills.
What "embedded" means
The database runs inside the analyst's process.
DuckDB is an embedded analytical database. "Embedded" means it runs as a library inside the
application that calls it, the same way SQLite runs inside whatever program imports it, rather than
as a separate server process the application talks to over a network. The analyst's Python script,
Jupyter notebook, or CLI tool is the database. There's no server to provision, no port to
open, no cluster to keep warm. The DuckDB binary is roughly thirty megabytes; pip install
duckdb finishes in seconds.
"Analytical" means it's optimized for OLAP (online analytical processing) workloads, where the engine scans many rows, aggregates or filters, and returns a small result, rather than OLTP (online transaction processing) where the unit of work is "find one row, update one field." Internally that means columnar storage (data laid out column-by-column rather than row-by-row, so a query that touches three fields out of twenty only reads those three) and vectorized execution (the engine processes batches of about a thousand values at a time using SIMD CPU instructions, rather than looping one row at a time). Those are the same techniques ClickHouse and Snowflake use, so DuckDB's distinction isn't the techniques but rather that they're packaged into a library you can embed instead of a cluster you have to operate.
Practically, that combination removes three categories of friction that I think have been the main reason threat hunting at most organizations remains less data-driven than it could be:
- No infrastructure provisioning. The analyst doesn't need a database admin, a VPN into a warehouse VPC, or a cluster sized for their workload. Their laptop is sufficient for single-source, GB-scale to low-TB-scale queries. The data engineering team isn't a gatekeeper for exploratory work.
- No ETL step before querying. DuckDB reads Parquet and Iceberg files directly from
S3 with predicate pushdown (it pushes the
WHEREclause down to skip irrelevant files) and column pruning (it reads only the columns the query touches), so data already in the lake is already queryable with no "load this source first" step. - No warehouse cost per query. The cost model is S3 read bandwidth and local compute, not Snowflake credits or BigQuery slot-seconds. For ad-hoc exploration where you may run fifty queries and keep three results, that economics shift matters: analysts stop self-censoring queries because they don't know whether the curiosity is worth the credit.
MotherDuck (the commercial company building managed DuckDB) extends this pattern with a hybrid local
plus cloud execution model and a serverless backend for teams that want to share results without
shipping a .duckdb file around Slack. For purposes of this essay the relevant fact is
that the engine itself is open source (MIT-licensed) and there's no requirement to adopt MotherDuck
to get the workflow benefits. I have no commercial relationship with either DuckDB Labs or MotherDuck.
What the laptop workflow looks like
A hunt in code.
The shape that I recognize as productive (and that I've seen described in Jake Thomas's talks and in community practitioner accounts) is a Jupyter notebook with three things imported: DuckDB, pandas, and a plotting library. The DuckDB connection is in-memory by default (no file specified means the engine is purely a query layer against external Parquet or Iceberg files; nothing is stored locally). The first query establishes the baseline, subsequent queries pivot, and intermediate results land in pandas DataFrames for visualization or statistical work.
import duckdb
import pandas as pd
import matplotlib.pyplot as plt
con = duckdb.connect() # in-memory, no local file
con.execute("INSTALL iceberg; LOAD iceberg;")
# Query 1: failed-login trend, last 30 days
trend = con.query("""
SELECT date_trunc('day', event_time) AS day,
count(*) AS failed_logins
FROM iceberg_scan('s3://security-logs/auth_events')
WHERE failed = true
AND event_time > current_date - INTERVAL 30 DAY
GROUP BY day
ORDER BY day
""").to_df()
plt.plot(trend['day'], trend['failed_logins'])
plt.title('Failed logins, 30 days')
plt.show()
# Query 2: identify spike days (mean + 3 standard deviations)
mean, std = trend['failed_logins'].mean(), trend['failed_logins'].std()
spikes = trend[trend['failed_logins'] > mean + 3 * std]
# Query 3: drill into one spike day
for day in spikes['day']:
con.query(f"""
SELECT src_ip, user_id, count(*) AS attempts
FROM iceberg_scan('s3://security-logs/auth_events')
WHERE failed = true
AND date_trunc('day', event_time) = '{day}'
GROUP BY src_ip, user_id
ORDER BY attempts DESC
LIMIT 10
""").show() Four properties of that code are worth naming because they're the ones that separate productive hunting from frustrated hunting.
The notebook is the audit trail. Every query that ran, in order, with the result that shaped the next query, is preserved in the notebook. When the hunt concludes (confirmed campaign, ruled out, escalated to incident response) the notebook is the documentation, so there's no separate write-up step and nothing to re-create the investigation from memory later.
Pandas is the pivot layer. DuckDB results convert to DataFrames with a single
.to_df() call. Statistical work that's awkward in SQL (anomaly detection, correlation
analysis, smoothing) happens in pandas. The next SQL query incorporates the pandas output. This is what
I mean by composability: neither tool tries to be the other; they hand columnar data back and forth
through Apache Arrow buffers underneath, which means the conversion cost is near zero.
The query reaches S3 directly. No intermediate database had to be loaded with this week's data. The data was written to Iceberg by the upstream pipeline (Spark, Kafka, or Lambda doing the preprocessing) and is queryable the moment it lands. The analyst's time-to-first-query is the time it takes to write the query, not the time it takes the data engineering team to provision a table.
The notebook is shareable. When the analyst hands the hunt to a peer for review, or files it as evidence in an incident, the notebook travels as a single artifact. Re-running it reproduces the investigation. That's a different governance model than "screenshot the SIEM result and paste it into a ticket," which is what I still see far more often than I'd like.
Performance, honestly
What "fast enough" actually means for hunting.
I want to be careful about the performance framing because vendor blog posts in this space routinely overclaim. The honest version: DuckDB on a laptop, querying Parquet files on S3 in the same region as the laptop is running from, is fast enough for threat hunting in a regime that practitioners describe as roughly tens of seconds for billion-row scans with selective filters. That regime overlaps with (but is meaningfully slower than) what a dedicated ClickHouse cluster would do for the same query against pre-loaded data.
Three caveats. First, those numbers depend on the Parquet files being column-sorted and partitioned in a way that supports predicate pushdown; if the upstream pipeline wrote the data badly, DuckDB will be scanning more bytes than it needs to. Second, S3 read latency dominates at smaller scales; pulling a few hundred small files is often slower than pulling a few large files of equivalent total size. Third, I'd previously left the head-to-head as a hypothesis because I hadn't benchmarked DuckDB against ClickHouse on OCSF-shaped security data with methodology I'd publish, and I've now run a first pass of that on the lab's reference stack, so the rest of this section reports what I measured rather than what the project docs claim.
The setup is deliberately narrow, and the caveat carries through everything that follows: this is a
single-host snapshot at one million rows of an OCSF network_activity table, four engines
(DuckDB, ClickHouse, Trino, StarRocks) on the same box, with each number the median of four trials and
the coefficient of variation in parentheses, and the answers checked equal across engines before any
latency was trusted. Each number is the median of four trials with the coefficient of variation in parentheses.
| Query shape | DuckDB | ClickHouse | StarRocks | Trino |
|---|---|---|---|---|
count(*) | 2.4 ms (CV 10%) | 18.2 ms (11%) | 39.9 ms (1%) | 68.5 ms (10%) |
needle filter dst_port=3389 | 5.7 ms (3%) | 22.1 ms (8%) | 45.3 ms (1%) | 97.5 ms (6%) |
group by dst_port | 12.1 ms (7%) | 30.1 ms (5%) | 55.3 ms (11%) | 96.6 ms (7%) |
On a single host at this size DuckDB's embedded engine is fastest on those small-batch workloads precisely because there's no coordinator to talk to and no network hop to pay, which is the same property that makes it fit the analyst's hand. The relative pattern is the finding here, not the absolute milliseconds, since those move with the box.
The interesting result is where that ordering breaks. On a high-cardinality
distinct src_ip (which I scored latency-only and didn't gate on a matching answer, because
ClickHouse's count(distinct) is approximate by default and so isn't comparing the same
computation) StarRocks already overtook the field at 97.7 ms (CV 2%), ahead of DuckDB's
139.7 ms (14%), ClickHouse's 168.7 ms (6%), and Trino's 427.9 ms (17%). That's the first
place a specialized engine pulls ahead even on a single host at a million rows, and it reads to me as
an early sign that the edge those engines hold is a scale-and-concurrency property rather than a
raw-latency one, which this snapshot doesn't isolate. So nothing here contradicts the claim later in
this essay that ClickHouse pulls ahead on concurrency and at larger scale, because the snapshot is a
single host at small scale and says nothing about the territory where that advantage lives. Treat these
as one measured data point, not a verdict; a multi-host, higher-scale, concurrent run is the next pass.
I've since run that larger pass on the same reference stack, still a single host, at one hundred
million rows of the same network_activity table (./moar bench 100000000,
median of four trials), and it gives the first measured crossover. On a count(*) full
scan ClickHouse now comes in at 10.5 ms against DuckDB's 12.4 ms, with StarRocks at
48.2 ms and Trino at 44.4 ms, so ClickHouse has overtaken DuckDB on the pure scan as the
volume grew. The selective work still goes to DuckDB though: the needle filter on
dst_port=3389 ran 77.7 ms on DuckDB ahead of StarRocks at 95.0 ms, ClickHouse
at 182.4 ms, and Trino at 419.2 ms, and the group by dst_port came in at
103.1 ms on DuckDB against StarRocks's 194.4 ms, ClickHouse's 229.1 ms, and Trino's
668.9 ms. That's the measured form of the claim I make elsewhere in this essay that ClickHouse
pulls ahead as scan volume climbs, and it lands exactly where you'd expect it to without undercutting
the hunting case, because the crossover happens on the full-table scan while DuckDB still wins the
selective filter and the grouped working-set query, which are the shapes the interactive hunt actually
runs. The single-host caveat still holds, so the relative pattern is the finding rather than the
absolute numbers.
The concurrency half was the part I'd been calling out as unmeasured, and a sweep on the same reference
stack (lab/concurrency_sweep.py, single host) now puts numbers on it by running C concurrent
clients against a shared ten-million-row OCSF table, all firing the same scan-and-aggregate
(group by dst_port) while I watch throughput in queries per second and p95 latency as C
climbs from one to sixteen. At a single concurrent query DuckDB is still the fastest of the four at
45.8 q/s, but its throughput stays flat near 46 q/s no matter how many clients pile on, because
one process is already using every core and the concurrent clients just share a fixed core budget, so the
contention shows up as latency instead: p95 climbs from 57 ms at one client to 689 ms at
sixteen, roughly twelve-fold. ClickHouse starts slower at 20 q/s but its scheduler turns added
clients into aggregate throughput, scaling about 2.9-fold to 58 q/s and overtaking DuckDB by sixteen
clients while holding the gentlest p95 growth of the field (59 to 355 ms). StarRocks scales from 16
to 42 q/s before plateauing around eight clients (p95 102 to 476 ms), and Trino sits flat and
low at 9 to 13 q/s with the worst p95 by a wide margin (128 to 1736 ms), which is the
single-host coordinator overhead paying for a cluster that isn't there.
So the crossover I'd predicted from the single-query runs is now measured directly: at one concurrent query DuckDB leads, and by sixteen concurrent clients ClickHouse leads on aggregate throughput (58 against 45 q/s) and holds far better tail latency (355 against 689 ms). This is the measured form of the essay's claim that ClickHouse pulls ahead on concurrency, and rather than undercutting the hunting case it confirms where DuckDB's home actually is, because the embedded engine is built around one process saturating the cores for a single interactive hunter's working set, not around a scheduler arbitrating a shared multi-analyst serving tier, so concurrency on DuckDB just queues. The honest scope is the same as before: this measures engine scheduling under contention on one host, not the concurrency a multi-node cluster would distribute, which is still the next pass. That crossover is exactly what reorders an engine ranking when an environment weights concurrent serving over single-hunter latency, which is the kind of trade-off the worked scorecard resolves per archetype.
What matters for hunting is not query speed in isolation; it's time to insight. If a ClickHouse query returns in two seconds but the analyst spent three days waiting for the data engineering team to load the source, time to insight is three days. If a DuckDB query returns in forty-five seconds against data that was already in S3, time to insight is forty-five seconds plus query-writing time. For exploratory work, the second model wins by an order of magnitude on the metric that actually matters.
That argument flips for repeated, latency-sensitive work. A SOC dashboard refreshing every fifteen seconds in front of an analyst on shift is the opposite shape, because it's the same query thousands of times per day with sub-second latency required, and DuckDB is the wrong tool for that while ClickHouse or another dedicated OLAP engine is the right one. So the honest framing is that DuckDB tends to be the hunting tool and ClickHouse the dashboard tool, and a real SOC architecture may use both for different jobs.
Iceberg integration
Querying the security lakehouse directly.
The DuckDB Iceberg extension matters more than the Parquet support for the hunting story, because in a
real lakehouse the analyst doesn't want to know which Parquet file contains the last seven days of
authentication events. They want to query the logical auth_events table and let the
catalog handle the rest.
With the Iceberg extension loaded, DuckDB reads Iceberg metadata from S3 (or a REST catalog like
Polaris or AWS Glue), figures out which manifest files describe data in the analyst's date range, pulls
only the relevant Parquet files, and applies column pruning on the way in. The analyst writes
FROM iceberg_scan('s3://warehouse/security_logs.db/auth_events') and DuckDB handles
snapshot resolution, partition filtering, and predicate pushdown underneath. The analyst doesn't need
to know (and shouldn't need to know) how the data is laid out on disk.
Two practical notes on the current state. Iceberg V3 deletion vectors and row-level lineage are still rolling out across query engines as of early 2026; DuckDB's V3 support is progressing but worth verifying for your specific version before assuming feature parity with Spark or Snowflake on those capabilities. I've written separately about why the V3 features matter for security workloads; the link is below. The good news is that the V2-era functionality DuckDB has had for a while is sufficient for the bulk of read-side hunting workflows; V3 features matter most for the write side and for the compliance-erasure pattern.
DuckDB does not write to Iceberg as fluently as it reads. For analyst notebooks that's fine: the
notebook reads from Iceberg and writes results to a local .duckdb file or back out to
Parquet on S3. Heavy Iceberg write paths (the upstream normalization pipeline that Jake Thomas's team
built in Lambda, for instance) tend to use Spark or DuckDB's bulk write path rather than ad-hoc INSERT
statements. That's a sensible division of labor; I wouldn't try to write a continuous ingest pipeline
in a Jupyter notebook either.
Where it fits, where it doesn't
DuckDB versus the alternatives.
The decision framework I use when an architect asks "should we be running DuckDB?" has three comparators that matter: ClickHouse, Trino, and Spark. Each is the right tool for a different job, and I think the most common mistake is trying to make one engine do all three jobs.
DuckDB versus ClickHouse
ClickHouse is the right tool when the same query runs thousands of times per day: SOC dashboards, alert correlation, anything fronting an on-shift analyst. ClickHouse's cost model (always-on cluster, typically several thousand dollars per month at the low end) makes sense when the cluster amortizes across many users and many repeated queries, whereas DuckDB is the right tool when the queries are one-offs and the cluster would sit idle most of the time. So use both for their different jobs rather than forcing ClickHouse to be a hunting tool or DuckDB to be a dashboard tool.
DuckDB versus Trino
Trino's strength is federation: joining across heterogeneous sources in a single query (Iceberg plus MySQL plus PostgreSQL plus Snowflake). For a hunt that needs to enrich security events with a CMDB asset table in PostgreSQL and a user directory in MySQL, Trino federates across all three in one statement. DuckDB can't natively reach into MySQL or PostgreSQL the way Trino does. If your hunting workflow lives entirely in the Iceberg lake, DuckDB is fine. If you regularly join across operational databases, Trino earns its slot, accepting the cluster-management overhead.
DuckDB versus Spark
Spark is the right tool for scheduled batch transformations: nightly normalization, OCSF mapping, dbt jobs that materialize derived tables. Spark's distributed execution and write-path maturity for Iceberg are the differentiators. DuckDB is the wrong tool for any of those jobs at scale, and Spark is the wrong tool for ad-hoc notebook hunting, so the pattern that works in practice is that Spark writes the tables and DuckDB queries them.
When DuckDB is the wrong call entirely
There are two regimes where I would not reach for DuckDB. The first is anything requiring sub-second latency, because DuckDB on S3 has an S3 latency floor of hundreds of milliseconds before the engine even starts. The second is concurrent heavy workloads, because fifty analysts hammering the same lake simultaneously will saturate S3 read quotas and bottleneck on the slowest analyst's query, and ClickHouse with a shared cache is better shaped for that. So the honest answer is that the DuckDB sweet spot is roughly one to ten concurrent analysts doing exploratory work, rather than the entire SOC running its full workload through it.
Materialized views and hunt acceleration
Pre-aggregated baselines plus raw access.
A pattern that I think will become more common as security lakehouse architectures mature: pair DuckDB for raw exploration with pre-aggregated materialized views maintained by another engine. ClickHouse materialized views are the canonical example: the view incrementally aggregates as events arrive, exports to Parquet on a schedule, and DuckDB queries the exported Parquet for common hunting baselines.
The shape of the win is that a thirty-day "failed logins per hour" query against fifty billion raw events may take minutes to scan in DuckDB, while the same query against an hourly materialized view (720 rows for a thirty-day window) returns in seconds. For common baselines (authentication trends, top targeted users, geographic patterns) pre-aggregation may produce roughly an order-of-magnitude latency improvement.
The trade-off is real and worth stating, because materialized views accelerate known patterns but don't help novel investigations, and a hunter chasing a new hypothesis still needs raw data access. So the architecture that works is "both," with materialized views for the common patterns the SOC runs repeatedly and raw Iceberg-via-DuckDB for the anomaly deep-dive. Treating MV access as the primary hunting interface constrains the hunt to questions the data team thought to pre-aggregate, which defeats the point.
Governance reality check
The hard part isn't the engine.
The technical case for DuckDB-in-notebook is, I think, fairly settled. The harder part (and the part that determines whether this pattern actually works in your organization) is governance. Analysts running arbitrary SQL against the security lake from their laptops is a different access model than analysts running pre-approved queries through a SIEM. Three governance questions need real answers before the workflow scales beyond a small team.
Who has read access to which datasets, and how is that enforced? S3 IAM policies are the most common answer, often combined with table-level grants in the Iceberg catalog (Polaris, Glue, Unity Catalog). The catalog is the right place for fine-grained controls because the engine sitting on the analyst's laptop is downstream of catalog-enforced permissions. Without a catalog, every analyst with read access to the S3 bucket has read access to every table, which may be acceptable for a small SOC and unacceptable for a multi-team or regulated environment.
How are queries audited? A DuckDB-in-Jupyter session running locally produces no server-side audit trail by default. For compliance regimes that require query auditing (financial services regulatory, healthcare HIPAA-adjacent, government FedRAMP), this is a real gap. The workarounds I've seen: run analyst notebooks on a centralized notebook server (JupyterHub, Hex, Deepnote) where session logging is centralized; route queries through a managed DuckDB layer (MotherDuck does this); or accept that hunting happens in a lower-audit tier and only escalated findings move to audited systems.
What happens to results? The notebook-as-audit-trail benefit cuts both ways. The notebook contains query results, which may contain sensitive data the analyst has read access to but shouldn't be persisting in plaintext on their laptop. Notebook hygiene (clear outputs before commit, encrypted disk, no syncing to personal cloud) is an operational discipline the team has to practice rather than a box it can check once, so the honest version of the trade-off is that the workflow benefits come with a more distributed data surface that the SOC has to govern actively.
What I'd actually do
Practical guidance for 2026.
Four moves I'd recommend to a security architect evaluating this pattern, ordered roughly by how cheap they are to execute:
- Pilot with one team and one data source. Pick the team that already lives in notebooks (a detection engineering or threat intel team is the typical fit) and pick a Parquet or Iceberg source that's already in S3. Install DuckDB, write the three queries the team runs most often, and measure time-to-first-result against the current workflow. The pilot will either confirm the pattern in your environment or surface the governance gaps that need addressing first.
- Don't replace your dashboards. The mistake I'd flag is trying to move SOC dashboards off ClickHouse onto DuckDB-on-S3, because the two engines are for different jobs, so the pilot should add a hunting tier alongside the operational one rather than replace it.
- Plan for the audit story before the team scales. If your SOC will eventually have ten or twenty analysts using this pattern, decide now whether queries are running on individual laptops or on a centralized notebook server with session logging. The retrofit cost is meaningful; the upfront decision is cheap.
- Benchmark against your own workloads. The performance ranges I've quoted in this essay are directional. The actual numbers for your data layout, your S3 region, and your typical query shapes will differ. If DuckDB performance is going to be central to your architecture, run the benchmark before committing.
The shorter version is that DuckDB earns the slot in the analyst's hand for the same reasons SQLite earned the slot in many application developers' hands twenty years ago, because it's embedded, needs no infrastructure, and is the right shape for the workflow. The question is no longer whether the engine is good enough for security hunting, since Okta's deployment and the broader community traction settled that, but rather whether your governance model and your team's notebook discipline are good enough to support the workflow at scale, and that's where I'd put the architect's attention.
The edge this sits on
Edge: the single-node ceiling, and it's shape-dependent — at 1B rows over Iceberg, DuckDB still answers selective filters in 407–445 ms while high-cardinality top-N crosses every interactive line at 79–91 s, spilling and finishing rather than dying.
Signal: the edge is graceful, so watch the tail — if your slowest routine hunt's p95 grows faster than your hot table's row count, the ceiling is approaching, and you have quarters rather than days to plan the first server engine.
Caveat: one host, Tier B; what travels is ordering and shape. Locate yourself on the breaking-points map with your own numbers.