Security Data Works

Public production architecture teardown

Google Chronicle on BigQuery

Managed SIEM layered on BigQuery — separation of storage and compute, columnar Capacitor format on Colossus, schema-on-write normalization to UDM at ingest. YARA-L for detection-as-code. Chronicle is the lakehouse-SIEM existence proof: a hyperscaler running a security platform on its own data warehouse.

$0.24/GB

Annual storage rate (BigQuery active storage at $0.02/GB/month). Query compute billed separately at $6.25/TB scanned (first 1 TB/month free). Ingestion is free. Storage-equivalent cost model unlocks multi-year retention without a SIEM hot-tier premium.

The pipeline

  1. Sources

    Cloud + endpoint + identity

    GCP · AWS CloudTrail · CrowdStrike · Okta · firewalls

  2. Normalize

    UDM mapping at ingest

    Schema-on-write; ~thousands of fields

  3. Store

    BigQuery (Capacitor on Colossus)

    Columnar; partition + cluster; time travel

  4. Detect

    YARA-L 2.0 rules

    Multi-event correlation; Git-versioned detection content

  5. Serve

    SOAR + downstream

    PagerDuty · ServiceNow · Splunk SOAR

What composes, what’s brittle

  • Schema-on-write. Parse once at ingest; queries read pre-normalized columns.
  • Unlimited retention. Same storage tier as GCS — no SIEM-storage premium.
  • Time travel. 7-day default window via FOR SYSTEM_TIME AS OF.
  • What composes. Same BigQuery SQL surface; export to GCS; Spark / notebook access.
  • What's distinctive. UDM normalization at ingest collapses the AI-detection fragmentation gap inside GCP.
  • What's brittle. YARA-L proprietary; pay-per-scan cost spikes on high-frequency queries; GCP lock-in.

Sources: Google Cloud BigQuery pricing & storage docs · Google Cloud "Overview of the Unified Data Model" · Google Cloud "YARA-L 2.0" · Sergey Melnik et al., "Dremel: A Decade of Interactive SQL Analysis at Web Scale" (VLDB 2020)

See how the pattern lands on your workload.

The matrix scoring that justified each reference architecture's tool choices is the paid deliverable. The benchmark behind it is public — reproduce it on your own workload, then book a call to scope the work.