Security Data Works

H3-INTEGRATION-03 · Tier B · 4/5 (extension)

AWS Security Lake. Useful, qualified, AWS-shaped.

OCSF (the Open Cybersecurity Schema Framework, the multi-vendor schema standard for security data) was the right idea with the wrong adoption economics, because mapping vendor logs to OCSF was tedious manual work that didn't survive procurement, and AWS Security Lake changed the math for AWS-heavy shops. The change is real, but it's also narrower than the marketing suggests.

Why it matters

The OCSF adoption economics changed.

Before Security Lake, deploying OCSF in production meant a do-it-yourself data lake. S3 buckets with partitioning, lifecycle policies, encryption. A Glue Data Catalog with OCSF schema definitions. Lake Formation governance, row-level security, column masking. Lambda or Firehose pipelines per source. Custom OCSF mapping code per vendor (Okta, CrowdStrike, Netskope, the long tail). Plus the maintenance burden when OCSF evolves or vendor schemas change. Realistic shape: 3–6 months of build time for two to three engineers, 20–40% of an FTE on ongoing maintenance, $300K–$600K in loaded year-one cost. Most security teams can't justify that as a line item.

AWS Security Lake (generally available since May 2023) is a managed service that does the build for you, for the AWS-native sources at least. Storage on managed S3 with OCSF partitioning. Auto-managed Glue catalog. Lake Formation integration tied to IAM. Automatic ingestion for AWS sources: CloudTrail (API activity), VPC Flow Logs (network traffic), Security Hub (finding aggregation), Route 53 Resolver (DNS queries), EKS audit logs. OCSF mapping handled by AWS for the native sources, and by participating vendors for the third-party integrations.

The cost shape moves accordingly. Setup measured in 1–2 weeks rather than months. Maintenance closer to 5–10% of an FTE. Infrastructure $3K–$7K per month. Year-one totals in the $50K–$100K range for the same mid-sized organization, which is a 60–80% reduction versus DIY and a 90% reduction in time-to-value, so the OCSF adoption barrier got significantly lower for AWS-heavy shops, and that part of the claim survives scrutiny.

How open is it, actually

Open at some layers. Locked at others. The boundary matters.

Three things are genuinely open. The OCSF schema itself is Apache 2.0 with community governance and multi-vendor participation (Splunk, Cloudflare, Palo Alto, AWS, others). The data format on disk is Parquet, the open columnar storage standard, with OCSF-compliant JSON structure on top, readable by any Parquet-compatible tool (DuckDB, Trino, Spark, Dremio). Query access goes through Athena (AWS-managed Trino), direct S3 access, or third-party integrations with standard SQL, so the data is exportable and you can leave with it.

Three things are locked to AWS. Automatic ingestion is AWS-only. CloudTrail, VPC Flow, and the rest of the native sources auto-normalize to OCSF, but non-AWS sources require third-party integration or custom code, and leaving AWS means losing the automatic normalization layer. Governance runs through AWS IAM and Lake Formation; access policies don't migrate to non-AWS platforms without manual recreation. The managed service itself runs in your AWS account but isn't portable to Azure, GCP, or on-premises environments.

The honest summary is that Security Lake is more open than most vendor platforms and less open than fully DIY, because the data is exportable while the operational surface around it isn't. The framework that helps here is multi-layer openness, where the table format and schema can be open while the catalog, identity, and authorization layers remain proprietary. Security Lake is open at the storage and format layers, qualified-open at query, and AWS-locked at governance and ingestion. The governance layer is where the lock-in actually lives, though one piece of it is starting to migrate down into the open format: Iceberg V3 row lineage (the per-row _row_id and _sequence_number that shipped in Iceberg v1.9.0, April 2025) is a catalog-agnostic audit primitive, so part of the chain-of-custody story no longer has to ride on Lake Formation. If your strategy assumes you stay in AWS, that's a clean tradeoff, but if it assumes multi-cloud or AWS exit, the lock-in surface needs to be priced into the decision.

The "70+ integrations" claim, decomposed

Four tiers of integration. Only one of them is automatic.

Tier 1, native AWS sources.

CloudTrail, VPC Flow, Security Hub, Route 53 Resolver, EKS audit logs. Enable Security Lake and AWS auto-normalizes to OCSF with no mapping code and no maintenance burden, so the promise of "managed OCSF" applies here cleanly.

Tier 2, partner direct-write integrations.

CrowdStrike (EDR), Okta (identity), Palo Alto Networks (firewall), Cisco (network), Zscaler (SSE), and others. The vendor writes OCSF-formatted data directly into your Security Lake, handling the OCSF mapping on their side. The promise applies here too, with a catch, because the vendor decides which OCSF event classes they support and how accurately they map fields, so integration quality varies a lot across this tier.

Tier 3, "we support Security Lake" (marketing, not technical).

Some vendors claim Security Lake support but mean something thinner. They read from Security Lake (they query your data, they don't write to it), they have a Lambda function that drops files in S3 (not actually OCSF-formatted), or they're "planning to integrate" (vaporware). The verification step before relying on a Tier 3 claim is to ask for a sample OCSF output, check field mapping completeness, and confirm the vendor uses correct OCSF event classes, since the pattern shows up often enough that it's worth checking before you commit.

Tier 4, everything else (DIY required).

Internal applications. Niche security vendors without a Security Lake integration. On-premises infrastructure outside AWS. Legacy systems whose log format you can't change. For Tier 4 sources, the engineering effort returns to roughly DIY-OCSF-data-lake territory: custom mapping code, write-to-Security-Lake via API, ongoing maintenance. Security Lake doesn't solve the long-tail integration problem for non-participating vendors. Cribl, Tenzir, or custom code is still on the menu. The "70+ integrations" headline has to be read against your specific vendor stack: how much of it lands in Tiers 1–2, how much falls into 3–4.

Hidden costs

20–40% above the advertised pricing, on a recurring basis.

Three categories of cost don't show up in the Security Lake pricing page and tend to surprise security teams once production starts.

Third-party integration charges. Some vendors charge extra for Security Lake connectivity: "Security Lake connector is enterprise-tier only" (mandatory upgrade), "Security Lake write integration: $X per GB" (double-charging), "custom OCSF mapping is a professional services engagement" (consulting fees), so the vendor lock-in pattern migrates to the integration layer, where you pay or lose Security Lake compatibility.

Data egress for non-AWS query engines. Athena queries inside AWS pay $5 per TB scanned with no egress. Querying Security Lake from Dremio, Trino-on-prem, or a hybrid analytics layer adds $0.09 per GB egress on top of the scan cost. Multi-cloud or hybrid architectures pay an egress penalty for the privilege of accessing their own data.

OCSF mapping quality gaps. A vendor writes to Security Lake but maps only 60% of OCSF fields, leaving the rest null or in vendor-specific extensions, or maps to the wrong event class, or buries data in undocumented extensions that can't be queried reliably, so you pay for Security Lake storage and still need a custom enrichment layer to make the data usable for the workloads you bought it for.

None of these are unique to AWS, because they're the predictable surface where any managed-service economics meet the long tail of vendor heterogeneity, which is why the pricing-page calculation should be read as a floor rather than a ceiling.

What this extends

H3-INTEGRATION-03, with the AWS-specific qualifications.

The anchor hypothesis on the research page reads: OCSF is gathering substantial multi-vendor momentum, with 180+ organizations participating and an active ITU-T Study Group 17 standardization track — Recommendation X.icd-schemas, backed for ratification in December 2025, targeted for mid-2026. The standardization trajectory is Tier B. The petabyte-scale production claims attached to it are weaker (Tier C-D in places), which is why the page currently carries a confidence reduction on the production-scale dimension while keeping the standardization-trajectory dimension at full confidence.

Security Lake sits inside that picture as the strongest production deployment vector for AWS-heavy environments. It validates a specific claim: that managed OCSF infrastructure can drop the adoption cost by roughly 60–80% versus DIY for the AWS-native portion of an organization's telemetry. It does not validate the broader claim that OCSF runs cleanly at petabyte scale across heterogeneous vendor sources; the integration-quality variance across Tiers 2–4 is exactly the place where the broader claim has its evidence gap.

What would change the answer. The first thing is independent measurement of OCSF mapping quality across the major Tier 2 vendor integrations (CrowdStrike, Okta, Palo Alto, Cisco, Zscaler), since the gap between marketing compliance claims and actual field-mapping completeness is the next evidence question. The second is a managed multi-cloud OCSF equivalent (Azure Sentinel OCSF, GCP Chronicle OCSF, or a vendor-neutral platform) shipping at production quality.

The Tier 4 question deserves more than the "AI-generated OCSF mapping" shorthand I used to carry, because there are two mechanisms here and they fail differently. I think this page is the natural place to run the head-to-head, since Tier 4 is exactly where the long tail forces the choice. One path is formal: ontology-based data access (OBDA, the Ontop family) puts a concept-to-schema mapping in the query planner and rewrites a concept-level query to SQL at execution time. It is provably correct on what it covers. The catch is expressiveness. OBDA runs on the OWL2QL profile, which is first-order-rewritable with no recursion, so the queries that matter most in security (lateral-movement traversal, recursive identity and asset closure) fall outside what it can push down and force engine-side recursive CTEs or materialization. Ontop already ships native dialects for Trino, Snowflake, Databricks, DuckDB, and Spark, so OBDA over an OCSF columnar lakehouse is feasible today; there is no published latency benchmark at security scale yet (Ontop / Ontopic VKG-over-Iceberg demonstrations, 2026, Tier C).

The other path is probabilistic. GraphRAG or plain text-to-SQL lets a model compose the query, which buys unbounded expressiveness and no correctness guarantee. The cost shows up as a silent-error tax: SQL that executes, returns a plausible answer, and is semantically wrong, in a way validation does not reliably catch. Current text-to-SQL clusters near 81 to 82% execution accuracy on the BIRD benchmark against 92.96% for human analysts (BIRD leaderboard accessed June 2026 and the NeurIPS 2023 paper, Tier B; the benchmark's own ~32% annotation-error rate, CIDR 2026, means the real reliability is lower still). That gap lands on exactly the rare, adversary-tail event types where a detection has to be right. So the open research question for Tier 4 mapping quality is not "does AI do it" but the -02-vs-03 trade: provable-but-OWL2QL-limited rewrite against unbounded-but-unverified composition, decided on the adversary tail, and it isn't a solved feature yet.

Each of these moves the research-page anchor in a specific direction.

Useful, when its shape matches yours.

The full anchor hypothesis on OCSF, plus the contradiction tracking the petabyte-scale claim, are on the research page. The matrix offering applies these qualifications to the platform decision in your specific environment.