Security Data Works

Public production architecture teardown

Okta on DuckDB-in-Lambda

Security data platform built around serverless OLAP. DuckDB runs inside AWS Lambda for normalization and operational metadata harvesting, eliminating the per-query warehouse cost that traditional ETL stacks accumulate. Mini databases per invocation, not one shared engine.

250 GB/min

Peak normalization throughput, sustained by AWS Lambda concurrency — DuckDB embedded per invocation. Daily volume swings 1.5–50 TB/day (CloudTrail + VPC Flow); 7.5 trillion records normalized over six months across 130M files at production scale.

The pipeline

  1. Sources

    AWS logs

    CloudTrail · VPC Flow

  2. Ingest

    Kinesis / S3 raw

    Buffered event streams

  3. Transform

    Lambda + DuckDB

    SQL normalization in-function

  4. Store

    S3 normalized

    Durable result set

  5. Serve

    Downstream engines

    Detection + investigation

What composes, what’s brittle

  • 7.5T records / 6 months. Cumulative production scale across 130M files.
  • Why DuckDB. Embedded OLAP with full SQL; no cluster to operate.
  • Why serverless. Auto-scales with event volume; pay-per-invocation.
  • What composes. Normalized result feeds other engines for query-time work.
  • What's distinctive. Mini databases per invocation, not one shared engine.
  • What's brittle. Lambda cold-start tail; DuckDB version pinning across deploys.

Sources: Data Council talk "Processing Trillions of Records at Okta with Mini Serverless Databases" · MotherDuck case study · Julien Hurault, "Okta's Multi-Engine Data Stack"

See how the pattern lands on your workload.

The matrix scoring that justified each reference architecture's tool choices is the paid deliverable. The benchmark behind it is public — reproduce it on your own workload, then book a call to scope the work.