Public production architecture teardown
Google Chronicle on BigQuery
Managed SIEM layered on BigQuery — separation of storage and compute, columnar Capacitor format on Colossus, schema-on-write normalization to UDM at ingest. YARA-L for detection-as-code. Chronicle is the lakehouse-SIEM existence proof: a hyperscaler running a security platform on its own data warehouse.
Annual storage rate (BigQuery active storage at $0.02/GB/month). Query compute billed separately at $6.25/TB scanned (first 1 TB/month free). Ingestion is free. Storage-equivalent cost model unlocks multi-year retention without a SIEM hot-tier premium.
The pipeline
-
Sources
Cloud + endpoint + identity
GCP · AWS CloudTrail · CrowdStrike · Okta · firewalls
-
Normalize
UDM mapping at ingest
Schema-on-write; ~thousands of fields
-
Store
BigQuery (Capacitor on Colossus)
Columnar; partition + cluster; time travel
-
Detect
YARA-L 2.0 rules
Multi-event correlation; Git-versioned detection content
-
Serve
SOAR + downstream
PagerDuty · ServiceNow · Splunk SOAR
What composes, what’s brittle
- Schema-on-write. Parse once at ingest; queries read pre-normalized columns.
- Unlimited retention. Same storage tier as GCS — no SIEM-storage premium.
- Time travel. 7-day default window via FOR SYSTEM_TIME AS OF.
- What composes. Same BigQuery SQL surface; export to GCS; Spark / notebook access.
- What's distinctive. UDM normalization at ingest collapses the AI-detection fragmentation gap inside GCP.
- What's brittle. YARA-L proprietary; pay-per-scan cost spikes on high-frequency queries; GCP lock-in.
Sources: Google Cloud BigQuery pricing & storage docs · Google Cloud "Overview of the Unified Data Model" · Google Cloud "YARA-L 2.0" · Sergey Melnik et al., "Dremel: A Decade of Interactive SQL Analysis at Web Scale" (VLDB 2020)