Writing · Architecture decision
Schema-on-read vs schema-on-write. The $250K question.
Splunk stores raw events with minimal processing and applies structure only at query time, through regex-based field extraction. Every search re-runs that extraction. A dashboard or correlation rule that hits the same data a few hundred times a day pays the parse cost a few hundred times. I have watched that arithmetic turn a single authentication-failure panel into a five-figure monthly line item on a Tier-1 financial-services SOC, and the team that owned it could not see the cost because it was buried in aggregate license spend. At 1 TB/day the gap runs roughly $31,000/month against schema-on-write alternatives at $8,000 to $12,000 for comparable query latency (the schema-on-read figure anchored to Splunk's published G-Cloud 14 list price rather than a quoted street rate, so read it as a list-anchored directional gap, not a guaranteed invoice). The flexibility you buy with that premium is the part most write-ups skip. Across 100+ data sources with inconsistent field names, new formats arriving unannounced, and investigations that need fields nobody indexed in advance, schema-on-read keeps doing work the cheaper architectures cannot.
The repeated parsing problem
How Splunk's schema-on-read actually works.
Splunk stores raw events essentially unprocessed in compressed journal files within time-bounded buckets
(hot/warm/cold storage tiers). During indexing, Splunk extracts only five default fields:
_time, host, source, sourcetype, and
index.
TSIDX files associate keywords with event locations, enabling fast keyword-based filtering. But
structured field queries (user_id=alice, status_code=404,
source_ip=192.168.1.1) must parse raw text on every search using regex-based field
extraction rules.
An authentication log event, raw storage:
2024-01-15T14:23:47Z auth_server user=alice@company.com src_ip=203.0.113.45 result=failed reason=invalid_password attempt=3 Field extraction, applied at search time:
| rex field=_raw "user=(?<user_id>[^\s]+)"
| rex field=_raw "src_ip=(?<source_ip>[^\s]+)"
| rex field=_raw "result=(?<auth_result>[^\s]+)" This regex parsing executes every time you query these fields, not once at ingestion.
The compute cost compounds.
For a single ad-hoc search, parsing overhead is negligible. For dashboards refreshing every 5 minutes and correlation rules running continuously, costs compound dramatically. An authentication failure dashboard counting failed logins per user across 24 hours scans 50 GB of logs daily, runs regex operations across 50 million events, and refreshes 288 times per day. At Splunk Cloud's $0.50–1.50/query pricing, that single dashboard costs about $288/day, or $8,640/month.
Multiply across a typical SOC and the arithmetic runs away: 20 dashboards × $8,640 = $172,800/month, and 100 correlation rules running continuously adds another $200,000+/month, which puts the upper bound around $370,000/month in repeated parsing costs once you count the whole estate rather than one panel.
Splunk's answer: Data Model Acceleration.
Splunk's answer to repeated parsing is Data Model Acceleration (DMA), essentially materialized views
for TSIDX summaries. Define a data model specifying field extractions and structure, enable
acceleration (Splunk pre-computes TSIDX summaries in background), and query using
tstats instead of search.
Performance impact documented by Hurricane Labs: unaccelerated search 10.232 seconds, accelerated
tstats query 0.038 seconds, a 270× speedup. Splunk Enterprise Security relies heavily on
accelerated data models for correlation searches; without acceleration, ES correlation rules would
timeout or consume excessive resources.
But DMA carries its own costs. Storage overhead consumes 10–50% of raw data size, the CPU overhead for background summarization competes with production workloads, acceleration gaps appear when summarization jobs timeout or fail, and the schema rigidity means changes to a data model require full re-acceleration. Splunk's own documentation warns that "index-time custom field extraction can degrade performance at both index time and search time by enlarging indexes," which is to say that pre-extracting fields at ingestion (moving toward schema-on-write) creates its own performance challenges inside Splunk's architecture.
Schema-on-write
Pre-structure everything.
Elasticsearch, Microsoft Sentinel, and Google Chronicle take the opposite approach: define schema at ingestion, store data in structured format, query structured fields directly without runtime parsing.
An Elasticsearch index mapping:
{
"mappings": {
"properties": {
"timestamp": { "type": "date" },
"user_id": { "type": "keyword" },
"source_ip": { "type": "ip" },
"auth_result": { "type": "keyword" },
"attempt_count": { "type": "integer" }
}
}
} The same authentication event, structured storage:
{
"timestamp": "2024-01-15T14:23:47Z",
"user_id": "alice@company.com",
"source_ip": "203.0.113.45",
"auth_result": "failed",
"attempt_count": 3
} Field queries access pre-indexed fields directly, with no regex parsing at query time. Comparing across platforms on the authentication-failure dashboard query:
| Approach | Query time | Cost / query | Queries / day | Daily cost |
|---|---|---|---|---|
| Schema-on-read SIEM (unaccelerated) | 30–90 sec | $1.00–1.50 | 288 | $288–432 |
| Schema-on-read SIEM (acceleration-layer) | 1–5 sec | $0.10–0.30 | 288 | $29–86 |
| Elasticsearch | 1–3 sec | $0.05–0.15 | 288 | $14–43 |
Monthly cost projection at 1 TB/day ingestion:
| Approach | Monthly cost @ 1 TB/day |
|---|---|
| Schema-on-read SIEM (unaccelerated) | $31,000/month |
| Schema-on-read SIEM (with acceleration) | $12,000–18,000/month (acceleration overhead reduces savings) |
| Elasticsearch | $8,000–12,000/month |
Cost savings: 61–74% versus the unaccelerated SIEM, 33–50% versus the accelerated SIEM.
The reindexing problem.
Schema-on-write's major limitation: every schema change potentially requires reindexing historical data.
Ingest authentication logs for 6 months with source_ip stored as keyword, then
realize you need IP range queries, which require ip field type, not keyword.
Elasticsearch now needs you to create a new index with corrected mapping, reindex 6 months of data
(potentially terabytes), update all queries and dashboards to reference the new index, then delete the
old. Reindexing 500 GB at 50 MB/sec is 2.8 hours best case; at petabyte scale with cluster overhead, days
to weeks.
Every new log source also requires parser development before ingestion. Unknown formats cannot be ingested "as-is" and explored later; they must be mapped to schema first. This is the operational tax that schema-on-write architectures take on in exchange for the query-performance and cost wins.
Why schema-on-read still matters
The 100+ data source problem.
Enterprise security operations ingest logs from 100–200 different sources. Endpoint vendors (CrowdStrike, SentinelOne, Defender, Carbon Black) each invent their own field names. Network vendors (Palo Alto, Cisco, Fortinet, Check Point) have zero standardization. Cloud providers (AWS CloudTrail, Azure Activity, GCP Audit) have three different schema patterns. SaaS vendors (Okta, Office 365, Salesforce, Workday) each invent their own structure.
Field name variations for "source IP address":
| Vendor | Field name |
|---|---|
| CrowdStrike | RemoteIP |
| SentinelOne | SrcIpAddr |
| Palo Alto | src |
| AWS CloudTrail | sourceIPAddress |
| Azure | caller_ip_address |
| Okta | client.ipAddress |
Schema-on-write approach: map every vendor's field to a standardized schema at ingestion (OCSF, ECS, CIM). Schema-on-read approach: ingest as-is, handle field variations at query time:
| eval source_ip=coalesce(RemoteIP, SrcIpAddr, src, sourceIPAddress, caller_ip_address, 'client.ipAddress') Operational flexibility: schema-on-read allows ingesting new log sources immediately and exploring data before committing to schema design. Schema-on-write requires 2–3 days of parser development per source before ingestion.
Retroactive field extraction during investigations.
Threat hunting scenario: investigating suspected data exfiltration, you hypothesize DNS tunneling is the
method. You need to extract the query_length field from DNS logs to identify unusually long
queries (>100 characters).
Schema-on-read (Splunk):
index=dns
| rex field=_raw "query=(?<dns_query>[^\s]+)"
| eval query_length=len(dns_query)
| where query_length > 100 Result: extract new field retroactively from 90 days of DNS logs, no reindexing required.
Schema-on-write (Elasticsearch): if query_length wasn't in your original index mapping,
you have two options. Runtime field (Elasticsearch 7.11+) calculates the field at query time using
Painless script (performance penalty similar to schema-on-read). Or reindex 90 days of DNS logs with
new field mapping (hours to days of work).
The operational reality is that investigations evolve unpredictably, so the fields you end up needing tend to emerge from patterns you find during analysis rather than from a schema you defined in advance, and that flexibility is much of why a schema-on-read SIEM stays dominant in threat hunting even though it runs 3–10x more expensive than schema-on-write alternatives.
CIM as middle ground.
Splunk's Common Information Model provides standardized field names across data sources while
maintaining schema-on-read flexibility. Vendors provide "add-ons" that map their raw formats to CIM
fields at search time rather than index time, so searches reference CIM field names
(src_ip, dest_ip, user) regardless of vendor while the underlying
raw data stays unmodified, which is what lets vendor-neutral detection rules travel across SIEMs.
Limitations: field extraction still happens at search time (repeated parsing overhead remains). Not all vendors provide CIM-compliant add-ons. CIM coverage is incomplete (~40 data models, doesn't cover all log types). Elastic Common Schema (ECS) and OCSF provide similar standardization but require schema-on-write normalization at ingestion, not search-time mapping.
The hybrid approach
Raw retention plus normalized projection.
Production security architectures increasingly adopt dual storage: raw data (schema-on-read) plus normalized data (schema-on-write). The pattern:
Architecture pattern · dual storage
The raw tier stores original events without transformation. ZSTD gets you 30–50× compression, DuckDB, Trino, and Dremio read it directly with schema-on-read, and at $0.023/GB/month on S3 standard, 30 TB lands around $690/month. This is the tier threat hunters and forensic analysts live in, where the field you need is the one you didn't think to extract three months ago.
The normalized tier is OCSF/ECS/CIM-mapped at ingestion and tuned for query speed: columnar format, sorted by timestamp, fronted by ClickHouse, StarRocks, or Elasticsearch. It runs $3,000–8,000/month depending on platform and carries the predictable load, real-time detection, dashboards, and compliance reporting, the workloads where you already know which fields matter.
Cost-benefit analysis.
1 TB/day ingestion, 90-day retention:
| Component | Storage | Monthly cost | Use case |
|---|---|---|---|
| Raw (S3 + Iceberg) | 90 TB after 30× compression | $2,070 | Hunting |
| Normalized (ClickHouse) | 15 TB hot (7d) + 75 TB warm/cold | $5,500 | Detection |
| Total hybrid | 105 TB | $7,570/month | Both |
| Schema-on-read SIEM equivalent | 90 TB | $31,000/month | Both |
Cost savings: 76% ($23,430/month). Capability comparison: hybrid matches the schema-on-read SIEM on real-time detection (normalized tier), threat hunting (raw tier), retroactive field extraction (raw tier), and beats the incumbent SIEM on dashboard performance (no DMA gaps) while delivering $7.6K versus $31K monthly cost.
Production validation: named organizations already run this.
The hybrid pattern is in production at named organizations on the public teardown catalog: OCSF-normalized lakehouse security at banks (Standard Chartered, DNB, Bank Hapoalim), an MSSP (Ziggiz), healthcare (Yale New Haven Health), and SaaS (Cloudflare, Okta, Huntress). The closest analyzed case is Atlassian's Project Banyan on Databricks: security telemetry normalized to OCSF on the Silver layer, more than 21 billion events queryable in under a minute, with an 80% ingest-cost reduction reported in their public Databricks story.
I have also built OCSF-normalized lakehouse security on a large public-sector platform: raw retention on object storage with a normalized projection on warm storage, new log sources added without disrupting detection rules because the OCSF abstraction holds the contract. The architecture shape is consistent across these: a raw tier on cheap object storage for retention and hunting, a normalized OCSF tier for real-time detection, and a routing layer that does the normalization in flight.
Migration approach: 6-month parallel operation (Splunk + lakehouse), gradual workload migration per business unit, zero forced cutover dates. The parallel-operation pattern is much of what makes the migration tractable, because the cutover-style migrations are where I have watched most of the failures cluster, when a team flips a hard date and discovers a detection gap after the old system is already gone.
Iceberg V3 Variant: the third option
Schema-on-read semantics inside a schema-on-write format.
This essay frames schema-on-read vs schema-on-write as a binary architectural choice and resolves it
with a hybrid pattern. Iceberg V3 introduces a third option that partially dissolves the binary: the
Variant type (semi-structured/nested JSON, ratified in Parquet August 2025) lets you have
schema-on-read semantics inside a schema-on-write storage format. Concretely: you can store a
CloudTrail event as (eventTime TIMESTAMP, eventName STRING, raw_event VARIANT) in an
Iceberg table, query the structured fields with predicate pushdown, AND retain the full nested JSON for
retroactive field extraction during investigations. See the
Iceberg V3/V4 recalibration essay for the longer take.
Variant doesn't eliminate the "$250K remapping" problem this essay describes. It doesn't eliminate the need to maintain OCSF-normalized projections for cross-vendor query consistency. But it does substantially change the cost calculation. The raw-retention layer no longer requires a separate Splunk-style schema-on-read system, and the normalized projection no longer has to be exhaustive; you can OCSF-project the top-50 fields and leave the long tail in the Variant column for hunting-tier access.
Adoption caveat: V3 Variant engine support is still rolling out across Spark, Trino, DuckDB, and
Snowflake through 2026. The hybrid architecture this essay recommends is still the right answer if
your engine doesn't support variant_get() yet.
The second read/write axis: schema is upstream of commit.
Everything above is about one read/write question, the schema one: do you parse at ingest or parse at query. Once you put the data in a lakehouse, a second read/write question shows up underneath it, and the two are easy to conflate. I borrow the cleaner version of this distinction from Roman Kolesnev at Streambased, whose post "Your Iceberg table doesn't need to exist" splits the table interface into two contracts. The first is a read contract. Hand an engine a table identifier and it gets back a schema and a set of scannable bytes, and it does not care whether a Spark job wrote those bytes, a database inlined them, or they were synthesized on demand from a Kafka topic. Every realization preserves that read contract; preserving it is the point of the interface, and it is why the raw-Variant column above works as the hot/virtual realization point behind a single read contract. The second is a write contract, or commit contract: how data actually enters the table.
The commit contract is where the realizations diverge, and where streaming-ingest economics ride. Classic Iceberg writes Parquet files and registers them through the catalog, which is durable and engine-agnostic but slow, and it write-amplifies badly at high commit rates because each commit drops its own metadata footprint. DuckLake commits a SQL transaction instead, gets ACID from the database's existing concurrency control, and inlines small batches rather than spawning a file per write. Streambased's ISK never writes at all, because producing to the Kafka topic is the write. Committing every few seconds is exactly where Iceberg's per-commit file footprint turns into a $/GB problem, so for a high-frequency detection-ingest path the commit contract, not the schema question, is the axis that sets the bill. The schema read/write choice this essay spends most of its length on sits upstream of, and separate from, the commit-contract choice. You can decide parse-at-ingest versus parse-at-query without yet deciding file-write versus SQL-transaction versus never-write, and the cost surprises tend to hide in the second decision.
The honest null on all of this: a V4-efficient materialized Iceberg backend, single-file commits and Parquet-for-metadata landing cleanly, might be good-enough from the hot detection tier through the multi-year audit tier, in which case a tiered or polyglot substrate is complexity I talked myself into rather than complexity the workload demanded. I lean toward tiering being real for security data, where sub-second streaming detection and seven-year retention pull in genuinely different directions, but that is a lean, not a proof, and the null deserves a fair test rather than a rhetorical dismissal.
So I ran the part of that test I could run cleanly on a single host. On a MOAR reference stack against
real MinIO object storage (the bench script is lab/commit_tax.py), I wrote the same 100,000
rows two ways, once as a single batch commit and once as a 100-commit stream, which is the shape a
streaming detection-ingest path actually has. Classic Iceberg held the prediction almost exactly: the
metadata footprint went from 8.9 KB to 4,579 KB across the streamed version (about 515×), planning time
rose from 8.7 ms to 181 ms (about 21×), ingest stretched from 0.44 s to 16.3 s (about 37×), and the data
file count went from one to a hundred, because every commit drops its own file and its own metadata
whether the batch is large or tiny. DuckLake ran the identical 100-commit stream as SQL transactions,
kept its metadata in the catalog database, held planning flat at roughly 7 ms, and inlined the small
commits so they produced zero Parquet files. The absolute milliseconds are a single-host artifact and I
wouldn't quote them as a benchmark, but the shape is the finding, and the shape is Iceberg's
file-per-commit floor against DuckLake's catalog-transaction commit, which is the commit contract moving
the streaming bill rather than the schema-on-read-versus-write choice that sits upstream of it.
Cloud-native schema-on-write
Sentinel and Chronicle as proof points.
Microsoft Sentinel: ingestion-time normalization.
Microsoft Sentinel performs schema-on-write normalization at ingestion using Data Collection Rules and ASIM (Advanced Security Information Model). Logs ingested via Azure Monitor Agent or API; Data Collection Rules apply transformations (field mapping, filtering, enrichment); data stored in Log Analytics workspace (KQL-queryable); ASIM provides normalized schema layer.
Cost advantage: Azure-native logs (Azure AD, Defender, Office 365) ingest free for E5 licensed users. Only third-party logs incur ingestion costs ($2.30/GB). For 200 GB/day with 80% Azure-native and 20% third-party: Azure-native (160 GB/day) $0/month; third-party (40 GB/day) $2,760/month. Schema-on-read SIEM equivalent at 200 GB/day × $150/GB/year = $30,000/month. Cost savings: 91% for Azure-heavy environments.
Limitations: vendor lock-in (KQL not portable, data export expensive). 90-day retention ceiling for hot tier (Archive tier requires extra queries). Cost explosion at scale; 500 GB/day third-party = $414K/year.
Google Chronicle: BigQuery-powered security.
Google Chronicle (SecOps) stores security data in BigQuery with Universal Data Model (UDM) normalization at ingestion. Chronicle connectors ingest from 200+ sources; UDM normalization applied at ingestion (similar to OCSF); data stored in BigQuery columnar format; SQL queries via BigQuery engine (distributed, 1,000+ workers).
Cost advantage: storage at $0.020/GB/month, S3-equivalent, versus a schema-on-read SIEM at $12.50/GB/month for indexed storage. For 500 GB/day with 5-year retention: storage 912 TB × $0.020 = $18,240/month; query compute ~$5,000/month estimated. Total: ~$23,000/month. The schema-on-read SIEM equivalent for 912 TB indexed: $11.4M/month, not realistic, hence the common SIEM archives to reduce cost.
Chronicle is worth dwelling on because Google would be unlikely to bet SecOps on BigQuery if lakehouse patterns failed for security at scale, and Chronicle is a lakehouse in every structural respect that matters here (separation of storage and compute, columnar format, SQL queries, schema-on-write normalization), which is about as close to a production existence proof as the architecture has.
Decision framework
When each approach wins.
Choose schema-on-read when:
- 100+ diverse data sources with inconsistent field naming
- Exploratory threat hunting is a core operational requirement
- Retroactive field extraction during investigations is common
- New log sources appear frequently (M&A, vendor changes)
- Small-to-medium scale (<500 GB/day where costs are manageable)
- Splunk expertise already exists and transition costs are prohibitive
Cost tolerance: accept 3–10× higher costs for operational flexibility.
Choose schema-on-write when:
- Known detection patterns dominate (dashboards, compliance, scheduled queries)
- Schema stability exists (mature security stack, infrequent log source changes)
- Cost optimization is the priority (budget-constrained, high volumes)
- Cloud-native environment (Azure/GCP with free ingestion for native logs)
- Data engineering expertise available (to build normalization pipelines)
Cost benefit: 60–75% savings versus schema-on-read at scale.
Choose hybrid (raw + normalized) when:
- Petabyte-scale volumes make pure schema-on-read cost-prohibitive
- Both hunting and detection are critical operational requirements
- OCSF/ECS/ASIM adoption provides a stable normalization target
- Data engineering team can build and maintain dual storage
- 1–2 year timeline acceptable for migration with parallel operation
Sweet spot: 1 TB/day to 10 TB/day organizations with mature security operations.
Practical recommendations
For Splunk-resident teams and greenfield builders.
For organizations currently on Splunk.
Don't rush the migration, because Splunk's schema-on-read flexibility isn't easily replaced, and the 6-month parallel-operation pattern that mature migrations use exists for good reasons. Detection-rule migration means translating your SPL rules to SQL with validation rather than blind conversion; analyst training runs several months of upskilling on SQL and data-lakehouse concepts; schema normalization is a couple of days per source to OCSF-map your sources; and the workflow adjustments matter too, since hunting patterns differ between schema-on-read and normalized lakehouses.
Pilot approach: start with 2–3 high-volume, low-complexity sources (VPC Flow, DNS, CloudTrail). Build OCSF normalization pipeline with validation against Splunk results. Run parallel queries in both systems for 90 days, compare detection coverage. Expand to additional sources only after validation success.
For organizations building new architecture.
Start with hybrid from day one. Raw tier: S3 + Iceberg catalog + DuckDB/Trino query engines. Normalized tier: OCSF schema-on-write in ClickHouse/StarRocks. Routing layer: Cribl or Tenzir with dual-write to both tiers.
Storage ratio: 90% of data volume in raw tier (compressed), 10% in normalized hot tier (7–30 days). Query routing: known patterns (dashboards, detection) → normalized tier; exploratory hunting → raw tier; compliance/forensics → raw tier (legal requires original unmodified logs).
Cost: $7,000–15,000/month for 1 TB/day (versus $31,000 for a schema-on-read SIEM).
For OCSF normalization specifically.
LLM-assisted mapping can cut field-mapping time substantially per source: provide the model with the vendor schema plus the OCSF spec, it proposes a mapping, a human reviews the low-confidence fields, and you test the mapping against sample data before trusting it. The savings compound across dozens of sources, which is where the manual approach hurts most, though the accuracy still has to be checked per field rather than assumed.
Avoid over-mapping: map only fields actually used in detection rules and dashboards (typically 15–30 fields per source), not all 100+ available fields. The over-mapping trap converts a reasonable schema-on-write architecture into a schema-on-write-everything architecture that loses most of the cost benefit and adds operational drag.
There is no universal answer
Both approaches solve different problems.
Schema-on-read buys operational flexibility and pays for it in query performance and license spend. That is the right bargain for threat hunting, where the investigation path is unpredictable by definition and the field you need is the one nobody indexed. Schema-on-write inverts the bargain. It is faster and cheaper per query but rigid against schema change, which is exactly what you want for known detection patterns and compliance reporting and exactly what you don't want the morning a new log source lands mid-incident.
The honest answer for most teams over 1 TB/day with the data-engineering staff to run it is to stop choosing. Keep both: raw retention on cheap object storage for the hunting tier, OCSF-normalized projection on warm for detection. The framing has two layers, though, and that is the part I would send a reader away holding. The schema question, parse-at-ingest versus parse-at-query, is the one this essay measures. Underneath it sits the commit-contract question, file-write versus SQL-transaction versus never-write, and on a high-frequency streaming-ingest path that second layer is where the $/GB bill is actually set. Decide them separately. Most cost surprises I have seen came from teams that solved the schema question carefully and then let the commit contract default to per-commit file writes at a few-second cadence.
Chronicle's BigQuery backend, Sentinel's ASIM normalization, OCSF's vendor-neutral schema, and Iceberg V3's Variant type are all converging on the same shape: keep the raw bytes, project the fields you query, and don't pretend one paradigm covers every workload. I'd build for both layers and resist the tidy single-substrate story until V4's commit-footprint work has been tested against a real seven-year-retention, sub-second-detection security estate rather than a slide. The first slice of that test is now on the board: the commit-tax bench above measures classic Iceberg paying a roughly 515× metadata and 37× ingest penalty across a 100-commit stream while a catalog-transaction commit stays flat, so the commit-contract prediction is no longer a lean, it is observed on a real object store. What is still untested is whether a V4-efficient materialized Iceberg backend closes that gap well enough to carry the whole range from hot detection to multi-year audit on one substrate, and until that larger test runs, tiering is the bet I'm making, and I'm telling you it's a bet.
The $250K question is not abstract.
The Splunk-to-MOAR migration assessment quantifies the trade-off against your specific workload (your ingest volume, your detection patterns, your retention regime) and produces a TCO comparison plus phased migration plan you can execute or shelve. Two to three weeks. Fixed price.