Public production architecture teardown
Pinterest — Zero Trust FGAC at Trino + Gravitino
Pinterest's Monarch big-data platform — 30+ Hadoop YARN clusters, 17k+ nodes on AWS EC2, petabytes processed daily — runs Trino as one of several engines on top of a Credential Vending Service that issues per-user STS tokens scoped by IAM Managed Policies. Apache Gravitino sits as the Iceberg REST catalog; production-validated and named by Pinterest's Director of Big Data Platform on the ASF TLP graduation announcement (June 2025).
Monarch scale: 30+ YARN clusters processing petabytes and hundreds of thousands of jobs daily on EC2. CVS dynamically assembles AWS STS tokens at job-launch time — the base IAM role is never returned without at least one modifying Managed Policy attached. Zero standing privilege at the storage tier.
The pipeline
-
Identity
mTLS · OAuth · Kerberos
User authn; LDAP group membership drives policy selection
-
Broker
Credential Vending Service
AssumeRole with session Managed Policies; intersection-scoped permissions
-
Catalog
Apache Gravitino (Iceberg REST)
First open-source Iceberg REST catalog; production at Pinterest pre-TLP
-
Engine
Trino + Spark + Flink
Multi-engine query layer; Querybook + Jupyter for analyst surfaces
-
Store
S3 (Iceberg + raw)
Hadoop S3A custom credentials provider presents user-specific tokens
What composes, what’s brittle
- Per-job STS tokens. CVS issues short-lived credentials; permissions intersect Managed Policies with the base role.
- Why Trino here. ANSI SQL across federated lake + catalog without copying data; analyst access is the dominant query pattern.
- Why Gravitino. Open Iceberg REST catalog; ASF TLP June 2025; Pinterest named in the graduation announcement as a production user.
- What composes. Same CVS-vended credentials flow through Spark, Flink, Hive, and Trino — one trust model, many engines.
- What's distinctive. Zero standing privilege at the S3 layer for an analyst population at petabyte scale — not aspirational, in production.
- What's brittle. Kerberos host/service-name discipline; CVS single-instance failure modes; LDAP group sync latency on permission changes; cache-expiry windows of minutes before new policies take effect.
Sources: Pinterest Engineering, "Securely Scaling Big Data Access Controls At Pinterest" (engineering blog, primary source) · Apache Gravitino TLP graduation announcement (June 2025) — quote from Ang Zhang, Director of Big Data Platform, Pinterest · Apache Software Foundation press release.