Security Data Works

Public production architecture teardown

Falcon LogScale — brute-force scan architecture

CrowdStrike-owned (acquired Humio 2021) log platform built on the inverse of conventional indexing: ~1 MB time-series index per day, compressed segments on object storage, brute-force scan of compressed data in CPU cache. Kafka as the streaming backbone; mechanical-sympathy engineering — SIMD, cache-friendly layouts, bloom-filter pre-screening.

1 PB/day

Vendor-published infrastructure sizing — 63 nodes total (18 Kafka + 45 Humio processing). Throughput claim 1 GB scanned in 0.0265s (~37.7 GB/sec) is consistent with cache-resident scan, but is vendor-reported and not independently replicated. Treat the architecture as durable; treat the multipliers as the vendor's.

The pipeline

  1. Ingest

    Apache Kafka

    18 nodes at 1 PB/day; persistent disks (Kafka requirement)

  2. Process

    Humio nodes

    45 nodes; compress + segment + ~1 MB time-series index

  3. Store

    Bucket storage

    S3 / GCS / Azure Blob — compressed segments

  4. Filter

    Three-stage pre-scan

    Time range → metadata → bloom filter

  5. Scan

    Brute-force decompress + scan

    SIMD vectorized; data lives in CPU cache

What composes, what’s brittle

  • Index overhead. ~1 MB/day time-series index vs. 50–100% overhead on indexed SIEMs.
  • Why compression wins. L1/L2/L3 cache ~1 ns vs. SSD ~10,000 ns — smaller data fits in cache.
  • Best fit. High-volume, time-bounded queries: threat hunting, forensics, long retention.
  • Where indexes still win. Unbounded full-text search; complex multi-table joins; small (< 10 GB/day) datasets.
  • Composes with. Kafka backbone; can sit alongside ClickHouse/StarRocks in a MOAr Engine layer.
  • What's brittle. Vendor-reported benchmarks; recursive graph traversal and faceted full-text not the design center.

Sources: CrowdStrike Engineering, "How Humio Leverages Kafka and Brute-force Search to Get Blazing-fast Search Results" (vendor blog) · Kresten Krab Thorup (Humio CTO), QCon 2019 · CLP (Yu et al., OSDI 2021) for the compression-over-indexing principle

See how the pattern lands on your workload.

The matrix scoring that justified each reference architecture's tool choices is the paid deliverable. The benchmark behind it is public — reproduce it on your own workload, then book a call to scope the work.