Technology deep-dive

One engine in front: StarRocks over shared Iceberg, and what it costs.

For most of this spring I've been running the same OCSF tables through four query engines on one machine, and the finding I trust most from that work is the one I've now graduated at 0.97 confidence: no single engine wins every workload. DuckDB takes most single-query shapes at every scale I've measured, ClickHouse takes the full-scan count at 100M rows and overtakes on concurrent throughput by sixteen parallel clients, StarRocks took the high-cardinality distinct at a million rows, and Trino takes nothing on a single host but can reach systems nobody else can see. The standard conclusion from that finding is engine specialization: run several engines over the same Iceberg tables and pin each to the workload it wins, a pattern I've published myself as a design principle. But every time I walk a security team through it the same objection comes back, and it has nothing to do with performance, because the question a SOC actually asks is who is going to run four engines.

Reading time: about 18 minutes. Evidence tier: B overall — version-pinned StarRocks and Trino documentation, first-party lab runs (single host throughout, so read the relative pattern rather than the absolute milliseconds), and practitioner and community sources, with vendor-channel accounts flagged where they appear. One leg was Tier D at first publication — the Splunk DB Connect stock-MySQL-driver check — and I've since run it (2026-06-10): the stock driver connects and returns rows, so the leg is upgraded in place with its limits stated where it appears.

The manageability question

Who is going to run four engines.

A security operations team is not a data platform organization, though the multi-engine literature assumes it is. Four engines means four upgrade cadences, four authentication stories to wire into the same SSO, four JDBC drivers to keep current inside analyst tooling, four failure modes to page on, and four places the same query can return a subtly different answer, which in security work is the failure that matters most because nobody notices it. The constraint underneath all of this is the talent pool, because people experienced in both security operations and data infrastructure are scarce and expensive, and a team that has hired one of them has not hired four engines' worth of them. The same scarcity is why security teams are right to be conservative about unproven tools, since adopting something new that is hard to understand, build, and run is a bet placed with the team's scarcest resource, and the move only makes sense when the wins are large on both axes at once, technical and operational, which is the bar everything in this essay should be read against.

So the question I keep coming back to is not which engine wins each workload, since I already know the answer is "different ones," but how few engines you can run before what you lose on workloads costs more than what you save on operations. This essay works through the strongest one-engine answer I can currently defend: StarRocks alone, in front of the same Iceberg tables everything else already shares, covers three of the four workloads the multi-engine pattern exists for, from a single MySQL-wire endpoint. Dremio could mount the same one-engine argument through Reflections' algebraic substitution, and my own matrix ranks it ahead of StarRocks as a standalone primary at all three archetypes I've scored, but the case I can defend for a SOC mid-migration runs through the stock-MySQL front door, which Dremio doesn't have, so StarRocks is the version of the premise under test here rather than a verdict against the Dremio version.

I want to make that case honestly, which means leading with the complication: my own lab numbers show StarRocks winning nothing on single-table scans at 100M rows, so the case rests on transparent materialized-view rewrite, native joins, and the front door rather than raw speed. The join leg is no longer a prediction, though, because the benchmark I pre-registered the morning of 2026-06-10 scored the same day: StarRocks took four of the five TPC-H-derived joins, and the result that matters most for this essay is that the whole field compressed — every SOC-shaped join finished in under 1.5 seconds on every engine I ran (one host, 10M–60M-row tables), which is the number the manageability argument has been waiting for. The join bench and its code are in the lab.

Scope

The four workloads.

When I say the multi-engine pattern exists for four workloads, I mean these. First, hot scheduled scans: dashboards, alert-status queries, the recurring aggregations that fire every few minutes and need to come back in well under a second. Second, ad-hoc investigation joins: an analyst pivoting from an IP to an identity to a process across authentication, network, and endpoint tables, where the query shape is multi-table and unknown in advance. Third, cross-source federation: queries that reach systems whose data hasn't landed in the lake yet, like the ticketing database, a Kafka topic, or the asset inventory. Fourth, fast bulk transport: pulling a few hundred thousand to a few million rows into a notebook or a Python hunting script, where the bottleneck is the client protocol rather than the query plan.

The pinning of engines to those workloads is less settled than the pattern implies, and I can prove that from my own published material, because one page of mine deploys "ClickHouse (or StarRocks) for the real-time path, Trino or Dremio for ad-hoc analysis, and Spark for batch transformations," while another pins ClickHouse to scheduled queries and StarRocks to ad-hoc investigations on the strength of its optimizer and joins. Both pairings are mine, and that the assignments swap between my own pages is evidence the per-workload winners are close enough to argue about, which is part of what makes the one-engine question worth asking, and it's a tension the single-front premise would actually resolve, since with one engine in front there is no per-workload assignment left for two pages to disagree about.

The single endpoint

What one endpoint covers.

Start with the front door, because for security teams it decides whether anything else matters. StarRocks speaks the MySQL wire protocol natively; the frontend's query_port defaults to 9030 and the deployment docs connect to it with a stock mysql client, with a floor of MySQL client 5.5.0 or later [B-vendor-doc]. No sentence in the official docs certifies a specific Connector/J version, but StarRocks' own Flink and Spark connector pages use the stock MySQL JDBC driver (com.mysql.cj.jdbc.Driver against jdbc:mysql://fe:9030), so this is MySQL-protocol compatibility demonstrated through the stock driver rather than a formal compatibility matrix [B-vendor-doc, indirect]. The reason I care about this particular protocol is Splunk DB Connect: during a migration the SPL-to-lakehouse bridge is a JDBC connection, and a stock-MySQL-driver endpoint should mean DB Connect can query the lakehouse through StarRocks without a custom driver. When this essay first published, that "should" was Tier-D inference, flagged loudly as the cheapest untested leg on the board, with a promise to update the page either way. I've now run it (2026-06-10): DB Connect 4.1.1's bundled stock driver (mysql-connector-j-8.2.0) connected to a StarRocks 4.1.1 all-in-one container on the FE's 9030 port and returned correct rows through dbxquery, with no handshake or auth-plugin failure — the classic break points for "speaks the MySQL protocol" claims. Two limits ride with that: it's a connectivity smoke test over a three-row table, so it proves the protocol leg and says nothing about performance at migration scale, and DB Connect 4.1.1's own jars require Java 17 while the Splunk container ships no JVM at all, so the Java runtime is on you either way. The front door covers the fourth workload too, with a transport tax I'll price in a moment, which is why I count three of four covered rather than two.

The hot tier is the cleverest part of the design. StarRocks has supported asynchronous materialized views built on external Iceberg catalogs since v2.5 [B-vendor-doc], and the optimizer performs transparent query rewrite against them: "without the need to modify the query statement, StarRocks can automatically rewrite queries against the base tables into queries against the corresponding materialized view," using a rewrite algorithm based on the SPJG (select-project-join-group-by) form [B-vendor-doc]. The covered shapes run wider than I expected, because beyond single-table SPJG there's join rewrite across inner, outer, semi, and anti joins, aggregation rollup, union rewrite, nested MV rewrite to a default depth of 3, and from v3.3.0 a text-based rewrite matching the query's abstract syntax tree against the MV definition [B-vendor-doc]. For Iceberg specifically, rewrite over external-catalog MVs is supported on both of the doc pages that discuss it, and the controlling property, force_external_table_query_rewrite, defaults to true since v3.3 [B-vendor-doc].

What makes this usable for dashboards is the freshness gate. The mv_rewrite_staleness_second property (v3.1+, citing the later of two pins the docs disagree on) works like this: if the MV's last refresh falls within the window, StarRocks uses it for rewrite regardless of whether base tables changed, and if the refresh is older than the window, it checks the base tables before substituting [B-vendor-doc]. That's a one-knob staleness contract per view, which maps cleanly onto how SOC dashboards tolerate staleness. The partition machinery underneath carries version texture: partitioned MVs over Iceberg partition transforms arrived in v3.2.3 and cover identity, year, month, day, and hour transforms (mapped to date_trunc), though bucket and truncate transforms are explicitly unsupported; partition-level refresh for Iceberg is pinned at v3.1.7/v3.2.3 in the feature-support matrix; partition-level change detection arrived at v3.1.4, and the same page caveats that the change-detection path currently supports Iceberg V1 tables only [B-vendor-doc]. Those last two are adjacent features carrying different version pins on two different official pages, which tells you how fast this corner of the product is moving and why I cite versions everywhere.

The fair counter-position on this tier is Dremio, because Reflections are the more ambitious version of the same acceleration idea and the comparison is worth a paragraph. A StarRocks async MV is a declared object: you decide what to materialize, the optimizer rewrites SPJG-shaped queries against it, and the materialization lives in StarRocks' internal storage. A Dremio Reflection is matched algebraically, so the planner can substitute it into queries that don't textually resemble it, including partial sub-tree matches, the newer Autonomous Reflections layer infers candidates from query patterns without anyone declaring them, and the materialization is written as Parquet into your own object store, which keeps the accelerated copy in an open format instead of inside the engine [B-vendor-doc]. That last property is an architectural advantage, and the cost is documented too, because algebraic matching is planner work and Dremio's own tuning guidance treats planning-time growth with reflection count and join depth as a thing to manage [B-vendor-doc]. The two then flip on the other legs: Dremio created Flight SQL and donated the Arrow Flight SQL JDBC driver, so its columnar transport is the most mature in the field, while its front door is its own driver rather than a wire protocol existing tools already speak; StarRocks holds the stock-MySQL front door while its Flight lane is the experimental one. The acceleration story there is the Reflections layer rather than the engine underneath, so the head-to-head that would settle this tier, declared MV rewrite against algebraic Reflection matching at matched freshness, has as far as I can tell never been benchmarked by anyone, including me, and it's first on my follow-up list.

The join tier is the engine's home turf. StarRocks runs a vectorized, pipeline-friendly hash-join core wired into four physical strategies the cost-based optimizer weighs at plan time (Colocate, Bucket Shuffle, Broadcast, and Shuffle, each hintable if you disagree with the planner), and the consumed side of a join generates runtime filters, Bloom or IN, that get pushed back into upstream scans to cut probe input early [B-vendor-doc]. Whether those runtime filters reach Iceberg scans matters for this essay's premise, and no doc page states it in versioned prose, but the fix stream presupposes it: PR #57651 fixes "the bug of the runtime filter cannot be pushed down for iceberg table with equality delete," which can only be a bug if the pushdown path exists [B-community], and the 4.1.1 release notes enable TopN runtime-filter pushdown for Iceberg aggregations alongside a datetime min/max scan optimization for Iceberg [B-vendor-doc]. I read that stream as plan-level behavior being actively extended rather than a marketing claim, and that's the direction of evidence I prefer to bet on.

The missing workload of the four is federation, and it stays missing. StarRocks' external catalog list runs to roughly ten types — Hive, Iceberg, Hudi, Delta Lake, JDBC, Elasticsearch (v3.1+), Paimon (v3.1+), a Unified catalog (v3.2+), plus MaxCompute and Kudu — and every one of them is lake- or database-shaped [B-vendor-doc]. Trino 481 ships 37 connectors, including Kafka, Cassandra, MongoDB, OpenSearch, Pinot, Druid, ClickHouse, Snowflake, BigQuery, Prometheus, Loki, and Redis [B-vendor-doc, comparator]. So the comparison comes out this way: StarRocks federates lakes and JDBC databases, while Trino federates the operational, streaming, and observability systems StarRocks cannot reach, and no amount of MV machinery changes that. I'll weigh what that costs below, because I think it's the giveaway whose weight varies most by team.

I'm not the first to make the fewer-engines call either. SmartNews publicly replaced a Trino-plus-ClickHouse pair with StarRocks citing the dual-engine operational burden, though that account travels through the CelerData channel, so weight it as vendor-adjacent [C-vendor-marketing], and Uber built and then retreated from Neutrino, their Presto-fronting-Pinot translation layer, on maintenance cost (Tier B). The academic record points the same way, because a decade of polystores (BigDAWG, Apache Wayang, CloudMdsQL, Polypheny, all Tier-A papers) tried optimizer-grade cross-engine placement and none survived to production, while what ships everywhere is thin, deterministic rule routing keyed mostly on the time predicate.

Bulk transport

The Arrow Flight lane.

Bulk transport is where the MySQL wire protocol is weakest, because the protocol hands the client rows while analyst tooling wants columns. My lab priced that tax this spring on a DuckDB backend (not StarRocks, so read it as a transport comparison rather than an engine one): pulling 100,000 rows, ADBC took 35 ms where a native-JVM JDBC client took 222 ms, a 6.3× gap, and at 1,000,000 rows it was 158 ms against 1,285 ms, an 8.1× gap [B-first-party, single machine]. The deinflation story belongs next to those numbers, because an earlier pass of the same bench reported ~276×, and that figure measured JDBC through a Python/JPype bridge; a native Java baseline later showed roughly 40–50× of it was the per-row JNI crossing rather than the transport, so the honest columnar-versus-row advantage is single digits, 5–10×, growing with result size. One standing caveat travels with the ratio: the ADBC path is Python-Arrow and the JDBC path is Java-rows, so what's measured is the paradigm difference in each one's natural runtime, not a single-language isolation.

StarRocks grew this lane in v3.5.1, which introduced an Arrow Flight SQL channel described as "a fully columnar data transfer pipeline from the StarRocks columnar execution engine to the client," labeled Experimental in the 3.5 release notes (the label lives in the release notes, not on the how-to page) [B-vendor-doc]. The how-to page reads further along than "experimental" suggests, since normal DDL, DML, and DQL all run over Flight SQL, with a Python ADBC driver (Python 3.9+), both JDBC and ADBC paths in Java, and the Arrow Flight JDBC driver pitched at BI tools [B-vendor-doc]. The operational shape is real though: arrow_flight_port has to be set separately in fe.conf and be.conf with different ports (the examples use FE 9408 and BE 9419), clients normally need direct connectivity to the backends to fetch results, and the proxy mode that relaxes that requirement "incurs a small performance overhead" [B-vendor-doc]. The lane's youth shows in the bug stream too, because the Flight SQL JDBC driver shipped a correctness bug, incorrect rows in the resultset (#64705), which for security data is the disqualifying kind; the fix was confirmed against 4.0.4 and the issue closed in February 2026, but a transport that was still closing out correctness bugs this year is a young one [B-community]. So when I say the ADBC advantage is still being integrated, I mean it precisely: the same endpoint is growing a columnar fast lane that my own numbers say is worth 5–10× on the workload analysts feel most, but today I'd stand on the MySQL wire for production and watch the Flight lane mature. The JRE question answered itself when I ran the DB Connect check: Splunk's container ships no Java at all, and DB Connect 4.1.1's own jars are compiled for Java 17, so whatever JVM you install to run DB Connect already clears the Flight JDBC driver's Java 11+ floor.

Honest giveaways

What you give up.

Before weighing costs I should reconcile this essay with my own graduated hypothesis, because "no single engine wins every workload" sits at 0.97 confidence in my tracker and I'm not walking it back. Specialization is real and I've measured it. The single-front move is the manageability counterweight rather than a refutation, and the question it poses is whether the spread between the per-workload winner and StarRocks, at the single-node scale a SOC runs, is large enough to pay for the second, third, and fourth endpoints. That's a quantitative question, and the join half of it now has a scored answer below; here are the giveaways as I'd weight them today, heaviest first.

The first giveaway is measured single-table performance, and I have to be blunt about my own numbers. On the 100M-row OCSF network_activity workload matrix (median of four trials, single host), StarRocks won no shape: ClickHouse took the full-scan count at 10.5 ms against StarRocks's 48.2; DuckDB took the dst_port=3389 needle at 77.7 ms with StarRocks a close second at 95.0, took the group-by at 103.1 against 194.4, and took the high-cardinality distinct at 4,091 ms against 5,180, with Trino erroring on that shape entirely [B-first-party]. The concurrency sweep tells the same story from a different angle, because StarRocks scales from 16 to 42 queries per second and plateaus around eight concurrent clients, while ClickHouse scales from 20 to 58, overtakes DuckDB by sixteen clients, and shows the gentlest p95 growth in the field at 59 ms to 355 [B-first-party]. Both runs are one host, so they measure scheduling under contention rather than cluster concurrency, but the direction is clear enough that I won't argue with it: if your dominant workload is hot scheduled scans at high concurrency, ClickHouse is the better-measured choice, and the MV rewrite story has to earn the difference back. The single-front case rests on the rewrite, the joins, and the endpoint, and I'd rather say that plainly than let the premise imply a scan-speed story my own benches contradict. The concurrency sweep is published with the rest of the benches, and the worked scorecard shows how this concurrency reweighting flips the ClickHouse-versus-StarRocks ranking from one archetype to the next.

The second giveaway is materialized-view freshness on streaming security data, where the constraint stack is taller than the headline feature suggests. External-catalog MVs refresh on a fixed interval or manually, with no event-driven option [B-vendor-doc]. Partition-level change detection only works against Iceberg V1 tables [B-vendor-doc], an unpartitioned base table forces a whole-MV refresh every cycle [B-vendor-doc], and Anton Borisov's account of running these MVs at Fresha is the best practitioner writeup I've found of what that does in production: partition change tracking where "if 10 records touched 10 partitions, you'd recalculate 10 partitions. The granularity was too coarse, the cost unpredictable, sometimes dangerous," refreshes that can OOM on many group-by keys over large data, and incremental view maintenance that in its first phase covers append-only Iceberg only, with joins still reading full snapshots from one side and retractable changes waiting on Iceberg V3 CDC [B-practitioner, no benchmarks in the piece]. The vendor's own release notes corroborate the in-progress state, because 4.1.0 extended incremental MV refresh to Iceberg append-only tables and then 4.1.1 disabled query rewrite over INCREMENTAL/AUTO MVs entirely while rejecting FORCE and partition refresh on them [B-vendor-doc], so the incremental path exists but isn't yet usable for transparent acceleration. The practical reading: a dashboard tier that tolerates minutes of staleness sits comfortably inside what mv_rewrite_staleness_second can contract for, while sub-minute alerting does not and shouldn't live in this layer anyway; it belongs to the streaming pipeline upstream of the lake.

The third giveaway is Iceberg write and maintenance maturity, which matters because a front engine still sits in an architecture where something has to write. StarRocks can INSERT INTO and INSERT OVERWRITE Parquet-formatted Iceberg tables from v3.1 (Parquet-only on the sink, no ORC writes), can CREATE TABLE and CTAS from v3.1, and only gained DELETE, via position delete files, in 4.1.0 [B-vendor-doc]. MERGE INTO hasn't shipped in any release as of 4.1.1 (a parser-and-analyzer PR landed on main the day I drafted this, so the gap is being worked), and the open feature issue states the consequence better than I would: the workaround chains "DELETE + INSERT or UPDATE + INSERT, which (a) splits the operation across statements so readers can observe the partial state, (b) duplicates the join logic each side, and (c) silently does the wrong thing when a source row matches more than one target row" [B-community, #73684]. That same issue asserts UPDATE works on Iceberg, while the official 4.1 release notes document only DELETE, so I'm not claiming UPDATE in print until I can verify it. Trino's Iceberg connector, for comparison, supports INSERT, UPDATE, DELETE, TRUNCATE, and MERGE [B-vendor-doc, comparator], and StarRocks' Iceberg maintenance tooling was still filling in at 4.1, which added the rewrite_manifests procedure and extended expire_snapshots and remove_orphan_files with finer-grained arguments [B-vendor-doc].

I also owe you a tension from my own paid work here, because my private Capability Matrix scores StarRocks 3 out of 5 on the "Iceberg native vs connector" criterion, Tier C, flagged "less mature," last in the field behind Trino at 5 and ClickHouse and DuckDB at 4, while my public offering page's one-word identity for StarRocks is "Iceberg-native." Both are mine and they pull against each other. The reconciliation I believe is that the read path has filled in fast — position deletes from v3.1.0 on Parquet, equality deletes from roughly v3.2.5 per the catalog page's phrasing, time travel at v3.4.0, Iceberg metadata tables at v3.4.1, hidden partitions on CREATE at v4.0, and Iceberg V3 default values plus row lineage in 4.1 [B-vendor-doc] — while the write path trails Trino and Spark by a clear margin. So the strongest version of this essay's claim is StarRocks as the read-and-serve front, with writes delegated to whatever already owns ingestion, which in the estates I've seen is usually Spark. That framing is also how I square this essay with my own matrix's cross-archetype verdict on StarRocks ("No archetype where it dominates; right home is dual-engine pattern"), because one engine in front was never one engine in the building: Spark keeps writing, the streaming pipeline keeps doing sub-minute detection, and the question this essay argues is the narrower one of how many engines need to sit between an analyst and the tables.

The fourth giveaway is the federation breadth I set aside earlier, and its weight depends almost entirely on your estate. If investigation queries routinely need to reach a Kafka topic, the MongoDB behind an internal app, or a Prometheus instance, then Trino's 37 connectors are not optional and the single-front pattern needs Trino beside it, at which point you're running two engines and should read my dual-engine material instead. But in the security estates I've worked in, federated sources tend either to get ingested into the lake eventually, because retention and correlation demand it, or to stay out of SQL reach entirely, so federation reads to me as a transition-period workload more often than a steady-state one, and giving it up is cheaper at year two than at month two. That's an opinion from operational pattern, not a measurement, and I'd weight it accordingly.

Pre-registered, scored same day

The bench that decided it.

Everything above about joins was doc-tier evidence and plan shape when I drafted it, so I pre-registered a benchmark on the morning of 2026-06-10, committed before any scored run, exactly so this essay couldn't quietly turn into marketing — and the scored run completed the same day: one engine timed at a time on one host, one discarded warmup and seven trials per query, every claim gated on the gap exceeding both arms' run-to-run variation, and every reported answer checked against a DuckDB ground-truth oracle (54 of 55 cells answer-identical; the 55th is a loud timeout, not a wrong answer). The workload is a TPC-H-derived join subset — derived, not TPC-H results, which is a publication requirement of the benchmark and an honest one — plus a SOC-shaped join suite on the pinned 10M-row Zeek conn corpus and a join-tax pair that prices the flattening pressure each engine exerts, with the StarRocks arm running 4.1 all-in-one over an external Iceberg REST catalog. Three pre-registered predictions matter here, stated with the probabilities I committed and the scored results appended:

Prediction #1 (~55% on StarRocks first): overall SOC-suite ordering of StarRocks > ClickHouse ≈ Trino, with the ClickHouse/Trino middle a coin flip. Scored result: partly right. The SOC ordering broke in ClickHouse's favor, because the native arm took the dimension-enrichment and IOC semi-join queries outright, both ClickHouse arms finished ahead of StarRocks on the two-streams aggregation join, and the 53.0-million-pair correlation join landed as a statistical three-way tie at roughly 0.86 s across StarRocks and both ClickHouse arms. "StarRocks first" held on the TPC-H-derived family instead, where it took four of five queries (all gate-claimable), including the heaviest six-table join (q9) at 2.7–2.8× faster than both ClickHouse arms.
Prediction #5 (~75%): the engine spread at this scale is low single-digit ×, not the 3–26× of the marketing literature. StarRocks's 3–5×, ClickHouse's "26× faster than v22.4," and the 20× a semantic-layer vendor markets all come from 100 GB–1 TB+ scale or from materializations, and at 10–60M rows fixed overheads compress differences. Scored result: confirmed, and it is the finding this essay turns on. Across eleven join queries and four arms, the SOC suite never exceeded 1.5 s on any engine, and the TPC-H-derived family stayed within low single-digit × end to end (StarRocks 7.21 s summed). Most cells sat in single digits, mostly under 3×, with a single statistical tie among them, the 53.0M-pair correlation join.
Prediction #6 (~55%): the flattening tax, joined-normalized versus pre-flattened, comes out smallest on StarRocks and largest on ClickHouse-over-Iceberg among the Iceberg arms, and a tax near 1.0× on any engine would mean you no longer need to flatten for that engine, which is the engine-side cousin of the OCSF flattening question. Scored result: wrong. StarRocks and both ClickHouse arms all pay essentially the same ~1.8× tax (on StarRocks that is 77 ms absolute, 88 ms flat against 165 ms joined), and Trino is the outlier at 4.35×, which is the measured version of why federation engines get handed the pre-joined copies. Nobody hit the ~1.0× that would retire flattening, but at sub-second absolute cost the join is not a reason to denormalize on three of the four arms at this scale.

One housekeeping note belongs in public: an old StarRocks-versus-ClickHouse results file in an archived predecessor repo was generated with np.random.uniform as harness test data, so it is simulated, has never been cited as a measurement, and the pre-registration says so explicitly so that nobody, including me, ever mistakes it for one.

Where this leaves the premise: prediction #5 held, so the manageability argument has its number. Engine specialization on joins is real, because StarRocks took four of the five TPC-H-derived joins and was 2.7–2.8× faster than both ClickHouse arms on the heaviest six-table plan while ClickHouse took the aggregation-shaped work, so the role assignments in my dual-engine material point the right way — but the magnitudes are single-digit ×, mostly under 3×, several cells are statistical ties, and no SOC-shaped join took any engine past 1.5 s. At that spread, the engine-assignment criteria in my Capability Matrix shift from raw join speed toward catalog maturity, concurrency behavior, and operational cost, which is where I suspected they belonged and can now say with a measurement behind it. The run also handed me the finding I didn't predict: the only completion failure in 55 cells was ClickHouse's own native table, which timed out past 300 seconds twice on the six-table TPC-H-derived q5 while the same engine ran the same query over Iceberg in 1.35 s. I published that with the mechanism flagged as inferred, and the EXPLAIN diagnostic I owed has since been run (2026-06-10); the inference was half right. The failure is statistics-dependent, but not in the way I guessed, because neither arm carries column statistics. The difference is that the native MergeTree path engages ClickHouse 26.5's greedy join-order optimizer, which had nothing but raw table row counts to plan with and reordered the joins dimension-first through a many-to-many nation-key step, building a measured 1.2-billion-row intermediate before lineitem ever entered the plan; disable the reordering and the same native table answers the same query in 5.3 s with the correct result, while the Iceberg table-function arm keeps the written join order and never saw the cliff. The deepening run that followed closed the loop: one ALTER TABLE … MATERIALIZE STATISTICS ALL pass over the six tables, about 22 seconds of work, flips the plan bushy and the same native table answers q5 in 0.675 s on fully default settings, with the oracle-matching result. And it isn't a 26.5 regression — greedy reordering shipped opt-in in 25.9 (PR #80848) and went default-on in 25.12 (PR #89312) with the stats-blind fallback documented from the start, and ClickHouse's own statistics PR #86822 demonstrates the no-stats pathological plan on TPC-H q5 itself, so the failure mode is vendor-acknowledged and the remediation is one command. The trap worth knowing: 26.5 auto-declares statistics, but bulk-loaded tables that have never merged get the declaration without the materialization, which is exactly the state this bench's loader leaves tables in. All of it is one host at 10M–60M rows, hot reads, engine defaults plus declared accommodations; the TB-scale vendor claims are a different regime this run neither confirms nor refutes. The other check I owed this essay, the one-evening DB Connect run against a StarRocks 4.1.1 all-in-one container, is done too, and the leg held: the front-door section above now carries the result and its limits in place of the Tier-D label this essay first published with.

Evidence tiers in this essay

[B-vendor-doc]: docs.starrocks.io and trino.io reference documentation, version-pinned, verified 2026-06-10 (current StarRocks: 4.1.1, released 2026-05-29; 4.1.0 GA 2026-04-13). [B-practitioner]: Anton Borisov, "StarRocks Incremental MV: A Bridge Over Shifting Ice," Fresha data engineering on Medium, 2025-11-26. [B-community]: StarRocks GitHub primary sources — PRs #57651 and #73707, issues #61789, #64705 (fixed, closed 2026-02-03), #73684. [B-first-party]: SDW lab runs — the ADBC/JDBC transport bench (single machine, DuckDB backend, ODBC pending) and the four-engine 100M workload matrix plus C1–C16 concurrency sweep (single host, Beelink Ryzen 5800H under WSL2, 2026-06-07). [C-vendor-marketing]: the SmartNews migration account via the CelerData channel and the StarRocks engineering blog on join internals, vendor incentive flagged on both. [B-first-party, upgraded from Tier D 2026-06-10]: Splunk DB Connect 4.1.1 against StarRocks 4.1.1 via the bundled stock MySQL driver (mysql-connector-j-8.2.0) — connectivity smoke test: connection and correct rows through dbxquery, not a performance claim. The q5 DNF mechanism was instrumented the same day (EXPLAIN diagnostic, described in the text). Join-bench numbers: SCORED 2026-06-10 [B-first-party] — engine-join-specialization (sdw-lab-benchmarks), pre-registered and committed before any scored run, then scored the same day on the same single host: StarRocks 4.1 allin1, ClickHouse 26.5.1.882 (Iceberg and native arms), Trino 481, all reading byte-identical Nessie-cataloged Iceberg tables on MinIO; 1 discarded warmup + 7 trials per query, claims gated on gap exceeding both arms' coefficient of variation, answers verified against a DuckDB oracle. Predictions appear with their committed probabilities and the scored results appended. Dremio Reflections claims: docs.dremio.com reference documentation [B-vendor-doc].