Security Data Works

Project 1 · MOAR

Modular Open Architecture for security data.

A traditional SIEM bundles everything — collection, storage, search, detection, dashboards — into one vendor's stack. MOAR un-bundles it. Each layer is a deliberate choice from open formats and best-of-breed tools, connected by open standards (Iceberg, OCSF, Polaris). You can swap any single component without re-platforming the others. Costs drop because the data platform stops paying SIEM-grade prices for storage and historical search. Performance rises because each layer is purpose-built. Vendor lock-in becomes a deliberate trade-off, not a default.

The LIGER stack

Five layers. Each one a concrete component, not an abstraction.

A canonical MOAR deployment has five layers, each replaceable independently. The point isn't the acronym — the point is that the layers have clean interfaces, so the architecture survives any single component being swapped out.

L — Lakehouse.

Open table format on object storage. The source of truth for all retained telemetry. Apache Iceberg on S3, MinIO, or Wasabi. Snapshot-based ACID semantics, time-travel for audit, schema evolution that doesn't break downstream consumers. The foundation that makes every layer above it portable: replace the engine without touching the data, replace the catalog without re-ingesting.

I — Index.

The catalog and metadata service. What makes the lake queryable from many engines simultaneously. Hive Metastore at the legacy end, Polaris (Iceberg-native, Snowflake-led OSS) and Nessie (Git-style table versioning) at the modern end, Unity Catalog when fine-grained governance is the load-bearing requirement. The catalog choice is where the actual lock-in surface lives — the Databricks/Iceberg analysis walks through why.

G — Graph / visualization.

The presentation surface analysts actually live in. Grafana for sub-second SOC dashboards, Apache Superset for ad-hoc analytical exploration, custom React or Streamlit for the threat-hunt surface, the existing vendor SOC UIs (Splunk, Elastic, Sentinel) preserved for federated read access during transition. The choice tracks team familiarity more than feature set — analysts don't relearn the analytical tooling lightly, and the engagement design respects that.

E — Engine.

Query execution against the lake. ClickHouse for raw speed on dashboard-driving subsets (sub-second P95 on the published Zeek benchmark, 145× faster than the dominant schema-on-read SIEM on the same workload). Dremio for the semantic layer plus Reflections-driven acceleration on shared datasets. StarRocks for Iceberg-native columnar analytics. Trino for federation breadth across heterogeneous sources. DuckDB for embedded or serverless analyst-laptop workloads. Most production deployments use two engines split by use case, occasionally three; the framework documents this on the decision-framework page.

R — Route.

The ingestion and routing tier. Pull from sources, normalize to OCSF, land in the lake. Tenzir, Vector, Cribl Stream, Kafka Connect, native shippers — the choice tracks source count and operational complexity tolerance. At 100+ sources, the routing tier stops being optional. At 500+ it dominates the operational surface. The DSPM infrastructure analysis covers why the routing layer is the prerequisite for everything the policy-layer marketing claims.

Why this works for security data specifically

Five properties that change the economics, not the marketing.

Retention economics flip.

Security data needs long retention for incident investigation and threat hunting. SIEM pricing punishes long retention — the per-GB-ingested model multiplies storage cost across the retention horizon. Lakehouse storage on object storage is cheap (S3 Standard at $0.023/GB-month, Glacier at a fraction of that), and query happens on demand against compressed columnar data. The per-query cost replaces the per-byte-stored cost. For a 5-year retention requirement, the economics aren't competitive — they're a different category.

OCSF-shaped data is portable across engines.

Once telemetry is normalized to OCSF (Open Cybersecurity Schema Framework — the multi-vendor schema standard for security data) and stored in Iceberg, multiple engines can query the same table simultaneously with no copy-out. The published benchmark validates this with four engines reading one Iceberg table on the same workload. The portability isn't theoretical; it's a property the lab verifies on every engine update.

Detection content survives engine changes.

Detection logic written in standard SQL ports cleanly between Trino, ClickHouse, StarRocks, and Dremio. SPL (Splunk's query language) and KQL (Microsoft Sentinel's) don't. The choice of detection-content language is a multi-year commitment that survives most other architecture decisions; standard SQL is the one that travels.

The SIEM doesn't have to die in the migration.

A common transition pattern: keep Splunk Search Head plus Enterprise Security for the SOC analyst experience; move historical retention and ad-hoc analytical workloads to Iceberg plus the chosen engine; federate. Splunk DB Connect is one bridge; native Iceberg integrations are emerging. Skill transition runs 6–18 months in parallel-run rather than a flag-day cutover. The regional bank case study walks through one shape this transition takes.

Vendor swap-out is real, not a slogan.

Because each layer is bound by open formats — Parquet on disk, OCSF in schema, Iceberg in table format, OIDC in identity — replacing the query engine doesn't touch the data. Replacing the catalog doesn't touch the engine. This is the structural difference between modular and integrated. It's not free; the operational discipline (infrastructure-as-code, monitoring, runbooks) becomes mandatory rather than optional. But the optionality it produces is what justifies the operational work.

Common objections, honestly

The trade-offs are real. Here are the ones that come up most.

"Five components is more operational complexity than one SIEM."

True. The trade is cost, performance, and portability against operational complexity. Operational discipline — infrastructure-as-code, monitoring, runbooks — becomes mandatory rather than optional. For teams that already operate at this discipline level, the complexity is a known cost; for teams that don't, the framework's organizational-constraints phase explicitly flags this as a precondition rather than a graceful learning curve.

"Our SOC analysts are trained on Splunk SPL."

Federated approaches preserve the SPL UX during migration — Splunk Search Head over Iceberg via DB Connect or native connectors keeps the analyst-facing surface intact. The skill transition runs 6–18 months as a parallel run, not as a flag-day. Detection content migration is the harder problem; it gets scoped explicitly in the engagement.

"We don't have data engineering capacity."

Then the migration assessment is the first step, sized to surface the gap. If the engineering capacity gap is too large to bridge inside the available budget and timeline, the assessment will say so. The framework's organizational-constraints phase is designed to fail prospects out at this question rather than under-scope the engagement.

"What about compliance, RBAC, and audit?"

Iceberg plus a real catalog (Polaris, Nessie, Unity) provides table-level RBAC, time-travel for audit, and column-level masking. The patterns are documented; the operational work to set them up is real. For regulated environments — financial services, healthcare, multi-region with sovereignty requirements — the catalog choice is the load-bearing decision. Unity Catalog's fine-grained governance is mandatory for the shared-corporate isolation pattern; Polaris is sufficient for the isolated-dedicated pattern. The component-criteria page covers the catalog scoring in detail.

When MOAR is not the right answer

Three environments where the operational overhead doesn't pay back.

The migration assessment is designed to surface "no-go" answers before money is spent on implementation. Three patterns where the answer is reliably "stay where you are":

The migration assessment surfaces these answers directly. A "no-go" finding is a successful engagement; under-scoping a deployment that lands the team in operational debt is the failure mode the framework is designed to avoid.

The engagement shape

Three entry points, scoped to risk tolerance.

POV — Benchmark on your data ($15K, 1–2 weeks).

The lightest entry point. The published benchmark methodology run against a representative slice (1–10M events) of the prospect's actual workload, with anonymization handled at the data layer. Outputs: a one-page TCO and performance readout suitable for the next CISO/CFO conversation, plus a benchmark replication script the team can re-run with more data before committing capex. POV cost credits toward the full migration assessment if the engagement proceeds within 90 days.

Splunk-to-MOAR Migration Assessment ($30K–$50K, 2–3 weeks).

The flagship engagement. Quantified TCO comparison (current Splunk spend versus modeled MOAR stack); workload classification (which queries port cleanly, which need redesign, which should stay on Splunk); engine recommendation matrix (ClickHouse for raw speed, Dremio for semantic layer ergonomics, StarRocks for Iceberg-native, Trino for federation breadth, with the specific trade-offs called out); migration risk register; 6–18 month phased roadmap with named decision gates; executive deck for CISO and CFO. Pricing scales for environments above 2 PB ingest or 50+ detection use cases.

Security Data Architecture Assessment ($40K–$80K, 2–4 weeks).

The clean-sheet design engagement when the prospect isn't Splunk-bound or wants vendor-neutral architecture from the start. Covers a 5-step audit (sources → ingest → storage → catalog → query), a 12-scenario storage decision framework matching to regulatory profile (HIPAA, SOX, PCI-DSS, GDPR residency, multi-region SOC), MOAR component selection, 3-year TCO with cost-optimization roadmap, and the migration roadmap. Higher-end pricing applies to multi-region, regulated, or DoD/IC-adjacent environments. Implementation support is available as a follow-on retainer once the architecture lands.

The data platform that survives every other choice.

The benchmark is the cost-and-performance receipts. The matrix is the working-detail decision math. DetectFlow and MLOps-hunting both sit on top of MOAR.