Public production architecture teardown
Falcon LogScale — brute-force scan architecture
CrowdStrike-owned (acquired Humio 2021) log platform built on the inverse of conventional indexing: ~1 MB time-series index per day, compressed segments on object storage, brute-force scan of compressed data in CPU cache. Kafka as the streaming backbone; mechanical-sympathy engineering — SIMD, cache-friendly layouts, bloom-filter pre-screening.
Vendor-published infrastructure sizing — 63 nodes total (18 Kafka + 45 Humio processing). Throughput claim 1 GB scanned in 0.0265s (~37.7 GB/sec) is consistent with cache-resident scan, but is vendor-reported and not independently replicated. Treat the architecture as durable; treat the multipliers as the vendor's.
The pipeline
-
Ingest
Apache Kafka
18 nodes at 1 PB/day; persistent disks (Kafka requirement)
-
Process
Humio nodes
45 nodes; compress + segment + ~1 MB time-series index
-
Store
Bucket storage
S3 / GCS / Azure Blob — compressed segments
-
Filter
Three-stage pre-scan
Time range → metadata → bloom filter
-
Scan
Brute-force decompress + scan
SIMD vectorized; data lives in CPU cache
What composes, what’s brittle
- Index overhead. ~1 MB/day time-series index vs. 50–100% overhead on indexed SIEMs.
- Why compression wins. L1/L2/L3 cache ~1 ns vs. SSD ~10,000 ns — smaller data fits in cache.
- Best fit. High-volume, time-bounded queries: threat hunting, forensics, long retention.
- Where indexes still win. Unbounded full-text search; complex multi-table joins; small (< 10 GB/day) datasets.
- Composes with. Kafka backbone; can sit alongside ClickHouse/StarRocks in a MOAr Engine layer.
- What's brittle. Vendor-reported benchmarks; recursive graph traversal and faceted full-text not the design center.
Sources: CrowdStrike Engineering, "How Humio Leverages Kafka and Brute-force Search to Get Blazing-fast Search Results" (vendor blog) · Kresten Krab Thorup (Humio CTO), QCon 2019 · CLP (Yu et al., OSDI 2021) for the compression-over-indexing principle