Security Data Works

Service Offering 3 · Implementation roadmap

The build sequence, and how it fails if unmanaged.

This is the reference sequence: four phases over roughly twelve months, from a single log source in Iceberg to a self-service semantic layer. The timelines, staffing estimates, and performance targets below are based on my own experience and a limited number of production deployments. Actual implementation timelines vary significantly with team size, existing infrastructure, organizational complexity, and security requirements. Treat these as a starting frame rather than a guarantee, and validate them against your environment.

How to read this

The reference sequence is not the engagement deliverable.

What follows is the generic phased path. It is useful as a shared mental model, but it is not the artifact the engagement produces. The Architecture Assessment produces the prospect-specific version of this roadmap: the same four-phase shape, but sequenced against your actual source inventory, your regulatory profile, and your team's capacity, with named decision gates at each phase boundary so the program can stop, re-scope, or proceed on evidence rather than on momentum. If the prospect is Splunk-anchored and the question is migration economics, the Migration Assessment is the wedge that scopes the exit before this sequence begins.

The phases below build the MOAR components: the lakehouse, real-time engine, and transformation layer described in the MOAR thesis. I link it rather than reproduce it here; this page is the build order, not the architecture.

Phase 4 is optional. The first three phases stand on their own; the semantic layer is an enhancement that some programs never need. And the five pitfalls at the end of this page are the reason the engagement front-loads the audit. Most of them are failures of sequencing and scope rather than of technology, and they are far cheaper to prevent in the assessment than to unwind in production.

Phase 1 · Months 1–3

Lakehouse foundation.

The goal of Phase 1 is narrow on purpose: store all security logs in Iceberg with 365-day retention, and prove you can query them. Nothing about real-time engines or OCSF normalization yet. The point of this phase is to get one source landed end-to-end so the team learns the operational shape before scaling it.

Tasks

Success criteria

Staffing

Phase 2 · Months 4–6

Real-time engine.

Phase 2 replaces SIEM dashboards with Grafana backed by StarRocks reading from Iceberg. This is the phase where the analysts feel the change, which is why it runs in parallel with the existing SIEM for validation rather than as a hard cutover. This is one of the highest-risk phases, and the one I would carry under Implementation Support if the program does not have deep StarRocks or Grafana experience in-house.

Tasks

  • Deploy StarRocks, or ClickHouse, pointing to Iceberg via Polaris.
  • Migrate 5 SIEM dashboards to Grafana.
  • Set up alerting with PagerDuty or Slack integration.
  • Create materialized views for the top queries.
  • Run in parallel with the existing SIEM for validation.

Success criteria

  • Dashboard query latency under 5 seconds.
  • Alerts trigger within 2 minutes of the event.
  • Analysts prefer Grafana over the SIEM UI.

Staffing

  • 1 security data engineer plus 2 SOC analysts for validation.

Phase 3 · Months 7–9

Transformation pipeline.

Phase 3 standardizes all logs on OCSF. This is the phase that is most often underestimated, because the transformation work is where schedules slip, and the second pitfall below exists to call that out. Tenzir handles the routing and normalization; dbt or Tenzir-native mappings carry the OCSF logic, version-controlled like any other code.

Tasks

Success criteria

Staffing

Phase 4 · Months 10–12 · Optional

Semantic layer.

Phase 4 is optional. The goal is self-service analytics through Dremio: business-friendly views that hide OCSF complexity so analysts can build datasets without engineering help. Some programs run this phase; others stop at Phase 3 because the value of self-service does not justify the additional operational overhead in their environment. The Architecture Assessment is where that decision gets made on evidence rather than by default.

Tasks

  • Deploy Dremio, or a similar semantic layer.
  • Create 20 business-friendly views that hide OCSF complexity.
  • Enable reflections for automatic query acceleration.
  • Train analysts on self-service dataset creation.
  • Deprecate the SIEM where possible.

Success criteria

  • Analysts create 10+ datasets without engineering help.
  • Query performance under 5 seconds for 95% of queries.
  • SIEM license reduced or eliminated.

Staffing

  • 1 security data engineer plus an analyst enablement program.

Five pitfalls

Why the engagement front-loads the audit.

These are the failure modes I see repeatedly. Most of them are failures of sequencing, scope, and operations rather than of technology. That is why the assessment runs before the build, and why the prospect-specific roadmap carries named decision gates instead of an open-ended schedule.

Pitfall 1 · Over-engineering the first iteration

Symptom. Spending 6 months designing the perfect architecture before ingesting a single log.

Fix. Start with one log source, Windows events or CloudTrail. Get it into Iceberg. Query it. Learn. Then expand. This is why Phase 1 is deliberately narrow.

Pitfall 2 · Underestimating OCSF transformation complexity

Symptom. "We'll just map logs to OCSF," and 6 months later the team is still working through 700+ mappings.

Fix. Budget 2–3 days per log source for transformation development. Use LLM-assisted mapping where possible. The assessment sizes this against your actual source inventory rather than assuming it away.

Pitfall 3 · Ignoring query performance until production

Symptom. Queries that ran in 5 seconds on sample data take 10 minutes on production data.

Fix. Load representative data volumes (1 TB or more) during the POC. Test at scale before go-live, not after.

Pitfall 4 · No Iceberg table maintenance plan

Symptom. Storage costs balloon because old snapshots are never expired. Queries slow down because small files accumulate.

Fix. Schedule weekly table maintenance: expire snapshots, compact files, update partition statistics. This is an operational commitment, not a one-time setup task.

Pitfall 5 · Choosing tools on hype, not requirements

Symptom. "We're using ClickHouse because it's fast," but the workload is 90% ad-hoc queries, where Trino or Dremio would be the better fit.

Fix. Match workload characteristics to engine strengths:

Workload Engine fit
Real-time alerting (under 1 second) ClickHouse, StarRocks
Ad-hoc investigation (flexible schema) Trino, Dremio
Batch ETL (large-scale transforms) Spark
Federation (multi-source joins) Dremio, or Trino with connectors

The reference sequence is the map. The engagement draws yours.

The Architecture Assessment produces the prospect-specific phased roadmap with named decision gates; Implementation Support carries the highest-risk phases. A 30-minute intro call confirms which shape fits.