Service Offering 3 · Design principles

Five principles, each with a test you can run.

The MOAR Architecture Design engagement applies a reference architecture, not a vendor opinion. That architecture rests on five design principles. I treat each one as a falsifiable claim: every principle carries a validation test, so the design we land on can be checked against your data rather than taken on faith. This page is the methodological backbone of the engagement: what I'm optimizing for, and how I prove I got there.

Principle 1

A vendor-neutral data layer.

Why

You don't want to be locked into a single analytics platform. The platform you pick today is the platform whose pricing, roadmap, and acquisition risk you inherit for the next several years. The defensible position is to keep the data independent of any one engine, so that switching the engine is a configuration change rather than a migration project.

How

Store data in open table formats (Apache Iceberg, Delta Lake, or Hudi) behind a vendor-neutral catalog, so no proprietary engine owns the canonical copy. In this practice the default is Iceberg, and the engine choices that sit on top of it (the MOAR components) are broken out in detail on the MOAR thesis page. I won't reproduce the component-by-component breakdown here, because this page is about the principle, not the parts list.

Validation

If you can swap Trino for Dremio without rewriting queries or migrating data, you've succeeded. That's the test, and it's the exact substitution the published benchmark exercises against the same Iceberg tables. Engine portability is something I've measured, not something I assert.

Principle 2

Separation of storage and compute.

Why

Storage is cheap, roughly $0.023/GB/month on S3. Compute is expensive, at $0.10 to $1.00+ per query depending on the engine and the scan. When storage and compute are coupled (the schema-on-read SIEM model), you pay compute prices to retain data you may query once a year. Decoupling them lets the cost of keeping data fall to the cost of storing data.

How

Keep the canonical data in object storage (S3, ADLS, or GCS) at ~$0.023/GB/month, and scale compute independently against workload demand rather than against data volume. Retention then becomes a storage decision, and query throughput becomes a compute decision, and the two stop fighting each other.

Validation

You can turn off all compute and still access your data via different tools. If shutting down every query engine makes the data inaccessible, storage and compute are still coupled and the principle has failed, regardless of how the vendor describes the architecture.

Principle 3

Compression-first design.

Why

Security logs achieve 10–12× compression with proper encoding. That ratio is the difference between an affordable retention horizon and an unaffordable one: it reduces 1 TB/day of ingest to under $700/month in S3 storage at 365-day retention. Compression isn't a tuning afterthought; it's the economic precondition for keeping a year of data online.

How

Use Parquet with ZSTD or SNAPPY for the Iceberg tables, and optimized ClickHouse codecs for the real-time path. The encoding choices get made up front, sized against the actual source mix, rather than left at engine defaults.

Validation

The falsifiable check is a number, not an adjective.

1 TB/day of raw logs should cost under $700/month for S3 storage at 365-day retention.
The comparison point is $20,000–$200,000/month for the equivalent retention on schema-on-read SIEM platforms (Splunk Cloud / Enterprise).

If the achieved ratio doesn't put the storage bill in that band, the design has failed this test and I say so in the deliverable rather than averaging it away.

Principle 4

Schema evolution without breaking changes.

Why

Security tools change. Log formats evolve. You can't stop the world to update schemas, and a platform that requires downtime or a data migration every time a vendor changes a field accumulates operational debt with every onboarding.

How

Use Iceberg's schema evolution together with OCSF normalization so new log sources land in the lakehouse without rewriting existing data and without taking the platform offline. The normalization layer absorbs format drift; the table format absorbs structural change.

Validation

Adding a new log source doesn't require downtime or data migration. If onboarding a source forces a maintenance window or a backfill, the principle has failed, and that's a test you can run on a single new source before trusting it at scale.

Principle 5

Query engine specialization.

Why

No single engine is optimal for real-time alerting, ad-hoc investigations, and scheduled reporting at once. An engine tuned for sub-second alerting is the wrong tool for a multi-table threat hunt, and a batch transformation engine is the wrong tool for a SOC dashboard. Insisting on one engine for all three means accepting it being mediocre at two of them.

How

Deploy ClickHouse (or StarRocks) for the real-time path, Trino or Dremio for ad-hoc analysis, and Spark for batch transformations, all reading the same Iceberg tables, so specialization doesn't reintroduce silos. Which engines actually get recommended for a given environment is workload-dependent, and where those recommendations move as the evidence shifts is tracked openly in research.

Validation

You can choose the right tool for each workload without data silos. If specializing engines means forking the data into per-engine copies, the architecture has traded one problem for another and the principle has failed.

Principles you can test are principles you can defend in procurement.

The design engagement applies these five principles to your data and reports each validation test by result, including the ones that fail. An intro call confirms whether the design track is the right shape for your environment.

Book an intro call → Back to the design track