Cloudflare — ClickHouse + DataFusion on R2 · Public production architecture teardowns

Public production architecture teardown

Cloudflare — ClickHouse + DataFusion on R2

Edge-network analytics at quadrillion-row scale, run on ClickHouse for nearly a decade — and now paired with R2 SQL, a distributed query engine built on Apache DataFusion that reads Parquet directly from R2 object storage via the Iceberg-backed R2 Data Catalog. Two engines, two retention envelopes, one telemetry plane.

1.61 Q events

Queried in under 2 seconds, across a full day. A separate query scanned 96 trillion events in one hour at the same latency. Margin of error under 1%. Latency held during a simulated North America data-center outage — the active-active soft-cluster design redistributes load without consensus overhead.

The pipeline

Sources

300+ edge data centers

DNS · WAF · bot management · security logs
→
Ingest

Pipelines + R2 streaming

SQL-transformed in flight; writes Parquet to R2 as Iceberg
→
Hot store

ClickHouse soft clusters

Active-active; dynamic node assembly; minimal coordination
→
Cold store

R2 + R2 Data Catalog

Parquet on object storage; Iceberg metadata; partition + column stats
→
Query

R2 SQL (Apache DataFusion)

Coordinator + workers; range reads; filter pushdown

What composes, what’s brittle

~10-year run. ClickHouse in production at Cloudflare since the mid-2010s — among the earliest large-scale adopters.
Active-active by design. No Raft/Paxos in the hot path; nodes can be addressed individually or as a soft cluster.
Why DataFusion on R2. Columnar Parquet + range reads + Iceberg pruning means cold-tier SQL without egress to a separate warehouse.
Composes with. Iceberg R2 Data Catalog is engine-agnostic — Databricks and ClickHouse can read the same tables.
What's distinctive. Two query engines on one storage plane, each chosen for the latency envelope it actually wins on.
What's brittle. Soft-cluster operational discipline; R2 SQL is new (2025) — engine maturity vs. the decade of ClickHouse battle-testing.

Sources: ClickHouse engineering blog, "Trouble will find you: How Cloudflare uses ClickHouse to scale analytics at quadrillion-row scale" (2025) · Cloudflare engineering blog, "R2 SQL: a deep dive into our new distributed query engine" · Cloudflare Pipelines product documentation (SQL transformations + Iceberg, Sep 2025).

Cloudflare — ClickHouse + DataFusion on R2

See how the pattern lands on your workload.