Security Data Works

Public production architecture teardown

Pinterest — Zero Trust FGAC at Trino + Gravitino

Pinterest's Monarch big-data platform — 30+ Hadoop YARN clusters, 17k+ nodes on AWS EC2, petabytes processed daily — runs Trino as one of several engines on top of a Credential Vending Service that issues per-user STS tokens scoped by IAM Managed Policies. Apache Gravitino sits as the Iceberg REST catalog; production-validated and named by Pinterest's Director of Big Data Platform on the ASF TLP graduation announcement (June 2025).

17k+ nodes

Monarch scale: 30+ YARN clusters processing petabytes and hundreds of thousands of jobs daily on EC2. CVS dynamically assembles AWS STS tokens at job-launch time — the base IAM role is never returned without at least one modifying Managed Policy attached. Zero standing privilege at the storage tier.

The pipeline

  1. Identity

    mTLS · OAuth · Kerberos

    User authn; LDAP group membership drives policy selection

  2. Broker

    Credential Vending Service

    AssumeRole with session Managed Policies; intersection-scoped permissions

  3. Catalog

    Apache Gravitino (Iceberg REST)

    First open-source Iceberg REST catalog; production at Pinterest pre-TLP

  4. Engine

    Trino + Spark + Flink

    Multi-engine query layer; Querybook + Jupyter for analyst surfaces

  5. Store

    S3 (Iceberg + raw)

    Hadoop S3A custom credentials provider presents user-specific tokens

What composes, what’s brittle

  • Per-job STS tokens. CVS issues short-lived credentials; permissions intersect Managed Policies with the base role.
  • Why Trino here. ANSI SQL across federated lake + catalog without copying data; analyst access is the dominant query pattern.
  • Why Gravitino. Open Iceberg REST catalog; ASF TLP June 2025; Pinterest named in the graduation announcement as a production user.
  • What composes. Same CVS-vended credentials flow through Spark, Flink, Hive, and Trino — one trust model, many engines.
  • What's distinctive. Zero standing privilege at the S3 layer for an analyst population at petabyte scale — not aspirational, in production.
  • What's brittle. Kerberos host/service-name discipline; CVS single-instance failure modes; LDAP group sync latency on permission changes; cache-expiry windows of minutes before new policies take effect.

Sources: Pinterest Engineering, "Securely Scaling Big Data Access Controls At Pinterest" (engineering blog, primary source) · Apache Gravitino TLP graduation announcement (June 2025) — quote from Ang Zhang, Director of Big Data Platform, Pinterest · Apache Software Foundation press release.

See how the pattern lands on your workload.

The matrix scoring that justified each reference architecture's tool choices is the paid deliverable. The benchmark behind it is public — reproduce it on your own workload, then book a call to scope the work.