Security Data Works

Public production architecture teardowns

Real, named, validated.

The strongest evidence in the catalog: production deployments at named organizations, reconstructed from the public record — conference talks, engineering blogs with measured outcomes. Not SDW engagements; what makes them trustworthy is that the facts are independently verifiable, citations and all. The validators wall below is the broader teardown evidence base.

Production validators

Proven at scale, in regulated industries.

These are the production teardowns the catalog is built on — nine named deployments across three tiers (internet/cloud scale, regulated industries, security-specific), running on the engines the catalog details. Real, validated, publicly attributable; citations link out.

Internet & cloud scale

Cloudflare

Quadrillion-row scale. 1.61 Q events queried in < 2s. DNS analytics, bot management, security logging.

ClickHouse

Comcast

10+ PB security data fabric. Hot retention > 1 year. 50K IOCs swept across 10 PB in < 30 min.

Snowflake

Pinterest

Zero Trust FGAC at the Trino query layer. Credential Vending Service issues per-user temporary STS tokens for security analysts querying massive S3 telemetry.

Trino / Presto

Regulated industries

Standard Chartered

Global bank replaced a traditional SIEM with a self-managed, distributed multi-cloud lakehouse. 80% faster time-to-detect, 92% faster investigation (bank-reported, DAIS 2025).

Databricks

Bank Hapoalim

Federated data-lakehouse on Trino/Starburst. Cross-border access governance plus near-real-time AML monitoring on the federated data; analytics-led, not a SOC-telemetry build.

Trino (via Starburst)

DNB

Norway's largest financial group. Cyber Defense Center moved off Databricks onto in-house DuckDB architecture.

DuckDB (via Ibis)

Security-specific deployments

Palo Alto Networks

Cortex XSIAM real-time security monitoring via stream processing. Mitigates threats with minimal delay at extreme event volumes.

RisingWave

RunReveal

Security data platform built natively on ClickHouse for HTTP analytics and massive log aggregations.

ClickHouse

Ziggiz.ai

Cyber Lakehouse-as-a-Service. 30-50% cost reduction vs. three leading SIEMs (Ziggiz-published); onboarding shrunk from 9 months to 5 days.

Databricks

Sources: Cloudflare, Snowflake, Databricks customer case studies; published case material per company.

Public production architecture teardown

$5K/mo

Huntress on ClickHouse

MDR/EDR business operating at fleet scale. Replaced Elasticsearch with ClickHouse Cloud on the same workload — driven by economics, not vendor advocacy…

Read the breakdown →

Public production architecture teardown

250 GB/min

Okta on DuckDB-in-Lambda

Security data platform built around serverless OLAP. DuckDB runs inside AWS Lambda for normalization and operational metadata harvesting, eliminating the…

Read the breakdown →

Public production architecture teardown

$2.30/GB

Microsoft Sentinel on Azure

Azure-native managed SIEM built on Log Analytics Workspace (columnar Kusto storage) with KQL as the query language. Schema-on-read at the storage layer…

Read the breakdown →

Public production architecture teardown

$0.24/GB

Google Chronicle on BigQuery

Managed SIEM layered on BigQuery — separation of storage and compute, columnar Capacitor format on Colossus, schema-on-write normalization to UDM at ingest…

Read the breakdown →

Public production architecture teardown

1 PB/day

Falcon LogScale — brute-force scan architecture

CrowdStrike-owned (acquired Humio 2021) log platform built on the inverse of conventional indexing: ~1 MB time-series index per day, compressed segments on…

Read the breakdown →

Public production architecture teardown

1.61 Q events

Cloudflare — ClickHouse + DataFusion on R2

Edge-network analytics at quadrillion-row scale, run on ClickHouse for nearly a decade — and now paired with R2 SQL, a distributed query engine built on Apache…

Read the breakdown →

Public production architecture teardown

10+ PB

Comcast — Security data fabric on Snowflake

Cybersecurity-at-Comcast moved off siloed, single-tool analytics onto a unified Snowflake-backed security data fabric. Schema normalization across endpoint…

Read the breakdown →

Public production architecture teardown

Off Databricks

DNB — DuckDB + Ibis + marimo, off Databricks

Norway's largest financial services group moved its Cyber Defense Center off Databricks notebooks onto an in-house platform built from composable open-source…

Read the breakdown →

Public production architecture teardown

17k+ nodes

Pinterest — Zero Trust FGAC at Trino + Gravitino

Pinterest's Monarch big-data platform — 30+ Hadoop YARN clusters, 17k+ nodes on AWS EC2, petabytes processed daily — runs Trino as one of several engines on…

Read the breakdown →

Public production architecture teardown

9 mo → 5 days

Ziggiz — Cyber Lakehouse-as-a-Service on Databricks

First public production reference to ship the Databricks-native Cyber Lakehouse pattern as a service — Delta Lake for storage, Unity Catalog for governance…

Read the breakdown →

Public production architecture teardown

80%

Standard Chartered — self-managed SIEM on Databricks

A global systemically-important bank replaced its traditional SIEM with a self-managed security lakehouse on Databricks — a distributed, multi-cloud Delta Lake…

Read the breakdown →

Public production architecture teardown

Federated

Bank Hapoalim — federated lakehouse on Trino/Starburst

Israel's largest bank migrated off Hive onto Starburst (Trino) over a Hadoop-based data lake — federated SQL access that leaves data where it lives. The…

Read the breakdown →

Public production architecture teardown

40%

Yale New Haven Health — SIEM modernization with Cribl + Sentinel

A major US health system hit its Splunk license ceiling when a Palo Alto software update added 63 fields to every firewall log, pushing daily ingest from 400…

Read the breakdown →

See how the pattern lands on your workload.

The matrix scoring that justified each reference architecture's tool choices is the paid deliverable. The benchmark behind it is public — reproduce it on your own workload, then book a call to scope the work.