The Build Integrity Problem Provenance Solves
You deploy a container image to production. Later, you discover a vulnerability in the logging library. You check the source code—the logging library isn't in your dependency list. How did it get in the image? Was it a build system compromise? Was the image replaced in the registry? Without provenance, you can't trace an image back to its source or verify the build was legitimate. Provenance solves this: it's a cryptographically signed record proving that a specific artifact came from a specific git commit, built by a specific (authorized) build system at a specific time. If the build process was compromised, the provenance record shows the discrepancy.
Build provenance is a cryptographic record that documents the complete history of how an artifact was created: the source code (commit hash), builder identity, build environment, build process, and timestamps, all cryptographically signed to prevent tampering.
graph TB Source["Source Code<br/>git: abc123..."] Source --> Build["Build System<br/>Authorized CI"] Build --> Artifact["Create Artifact<br/>hash: xyz789..."] Artifact --> Sign["Sign Record<br/>with Private Key"] Sign --> Record["Provenance<br/>git commit<br/>builder<br/>timestamp<br/>signature"] Record --> Deploy["Deploy with<br/>Provenance"] Deploy --> Verify["Verify Signature<br/>with Public Key"] Verify --> Result{{"Valid<br/>Provenance?"}} Result -->|Yes| Accept["✅ Accept<br/>Trusted Build"] Result -->|No| Reject["❌ Reject<br/>Unknown Origin"] style Accept fill:#ccffcc style Reject fill:#ffccccWhy Provenance Matters
1. Detection of Tampering
Provenance with a cryptographic signature proves the artifact hasn't been modified in transit. An original artifact signed by the build system has a hash and valid signature, while the same artifact modified by an attacker produces a different hash and an invalid signature that doesn't match the new content. The result is that tampered artifacts are immediately detected.
2. Proof of Legitimate Source
You can verify an artifact came from your repository and your build infrastructure. A container image in a registry claiming to be "myapp" can have provenance that proves it was built from github.com/myorg/myapp commit abc123, built by cloud.google.com/cloud-build project X, and signed with the build system's key as proof of authenticity. The result is that users can trust the image came from legitimate sources.
3. Incident Response and Forensics
When a security incident occurs, provenance lets you accomplish timeline reconstruction by knowing exactly when the artifact was created, scope determination to understand which source code was in the artifact, root cause analysis to identify what build environment was used, and impact assessment to determine which deployments are affected. In the case where an application is compromised at 14:00 UTC, you can get the artifact provenance showing it was built at 13:45 UTC from source commit abc123, check that source commit to see if it was modified and by whom, review the build environment to determine if it was compromised, and correlate with logs to identify when the exploit appeared in logs. The result is that root cause is identified and timeline established.
4. Compliance and Auditing
Regulations increasingly require supply chain transparency including Executive Order 14028 from the U.S. which mandates provenance for federal software, the NIST SLSA Framework which requires provenance at Level 2 and above, the EU Cyber Security Act which requires artifact traceability, and SOC 2 Type II which requires proof of artifact integrity. Provenance satisfies these requirements by proving artifacts are traceable and verifiable.
The In-Toto Framework
In-toto is the open-source framework that standardizes provenance. It defines link metadata (records of individual build steps), layout (the overall build workflow), and signatures (cryptographic proof of each step's authenticity).
In-Toto Provenance Structure
{ "_type": "https://in-toto.io/Statement/v0.1", "subject": [ { "name": "myapp:latest", "digest": { "sha256": "deadbeefdeadbeefdeadbeefdeadbeef..." } } ], "predicateType": "https://slsa.dev/provenance/v1", "predicate": { "buildDefinition": { "buildType": "https://cloud.google.com/build", "externalParameters": { "source": { "uri": "git+https://github.com/myorg/myapp.git", "digest": { "gitCommit": "abc123def456..." } }, "entrypoint": "Dockerfile" } }, "runDetails": { "builder": { "id": "https://cloud.google.com/cloud-build/abc123" }, "metadata": { "invocationId": "build-xyz789", "startedOn": "2024-03-15T10:00:00Z", "finishedOn": "2024-03-15T10:15:00Z" } } }}Key fields: subject is what was built (name and hash), predicateType is the type of provenance (SLSA, attestation, etc.), externalParameters are inputs visible to all (source code, configuration), internalParameters are implementation details, runDetails shows when, where, and by whom.
Provenance vs Attestation
These terms are sometimes used interchangeably but have subtle differences.
Aspect | Provenance | Attestation |
|---|---|---|
Definition | History of how artifact was created | Signed statement about artifact properties |
Focus | Build process and origin | Verification results |
Use Case | "Where did this come from?" | "Does this meet security requirements?" |
Example | Built from commit abc123 by Cloud Build | Scanned by vulnerability scanner, 0 critical vulns |
In practice: Provenance IS a type of attestation. A provenance attestation provides the historical build record.
Generating Provenance
Modern CI/CD systems automatically generate provenance.
Google Cloud Build
# Cloud Build automatically generates provenancegcloud builds submit --source . # Provenance is automatically stored in Artifact Registry# Accessible via: gcloud container images describe <image> --show-provenanceGitHub Actions
name: Build with Provenance on: [push] jobs: build: runs-on: ubuntu-latest permissions: contents: read id-token: write steps: - uses: actions/checkout@v3 - name: Build image id: build run: docker build -t myapp:latest . - name: Generate and sign provenance uses: actions/attest-build-provenance@v1 with: subject-name: myapp subject-digest: ${{ steps.build.outputs.digest }} push-to-registry: trueGitLab CI
build: stage: build script: - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA . attestation: sbom: true slsa: trueManual Generation (Advanced)
# Using open-source toolsslsa-provenance create-claim \ --artifact-uri "docker://myapp@sha256:xyz" \ --commit "abc123" \ --branch "main" \ > provenance.json # Sign with keycosign sign-blob --key cosign.key provenance.json > provenance.sig # Publish with imageoci-image-push myapp:latest --provenance provenance.jsonVerifying Provenance
Provenance is only useful if you verify it before deploying artifacts.
Command-Line Verification
# Download and verify provenancecosign verify-attestation \ --certificate-identity-regexp="https://github.com/" \ --certificate-oidc-issuer="https://token.actions.githubusercontent.com" \ ghcr.io/myorg/myapp:latest # Expected output:# Verification successful!# Provenance shows: built from github.com/myorg/myapp@abc123Kubernetes Admission Control
# OPA policy: require verified provenance before deploymentapiVersion: constraints.gatekeeper.sh/v1beta1kind: K8sRequiredProvenanceAttestationmetadata: name: require-provenancespec: match: kinds: - apiGroups: [""] kinds: ["Pod"] parameters: requiredProvenanceType: "https://slsa.dev/provenance/v1" requiredBuilder: - "https://cloud.google.com/cloud-build" - "https://github.com/actions"Provenance Storage and Distribution
Where and how should provenance be stored?
1. In Artifact Registries
Modern registries store provenance alongside artifacts:
# Google Artifact Registry automatically stores provenance# Retrieve via:gcloud container images describe \ us-docker.pkg.dev/myproject/myrepo/myapp:latest \ --show-provenance # Output: Complete provenance JSON with signatures2. Transparency Logs (Rekor)
Rekor is a transparency log that records provenance statements publicly:
# Publish provenance to transparency logcosign upload payload \ --payload-path provenance.json \ myapp:latest # Anyone can audit the log to verify the artifact was published# with this provenance at this timestamp3. Artifact Supply Chain Metadata
Some registries use standardized formats:
# OCI Artifacts (standardized)oras push myregistry.azurecr.io/myapp:latest \ --artifact-type 'application/vnd.example.slsa+json' \ provenance.json:application/jsonCleanStart and Provenance
CleanStart Source Intelligence Core captures provenance from all major CI/CD systems (Cloud Build, GitHub Actions, GitLab CI, etc.), stores provenance alongside SBOMs and VEX statements, verifies provenance signatures using Sigstore and standard key management, analyzes build integrity based on provenance metadata, enforces provenance requirements through admission control policies, and correlates provenance with vulnerabilities to understand artifact integrity.
This enables you to automatically verify when deploying an image: pull the image, verify its signature, retrieve its provenance, verify provenance signature, extract source commit from provenance, get SBOM for that commit, get VEX statements, check policies (SLSA level, builder, source repo, etc.), and allow or deny deployment based on all signals. All automated, all audited, all verifiable.
Provenance Best Practices
Make provenance a prerequisite for deployment and always verify provenance signatures without just trusting the content. Never manually create provenance but let CI/CD systems generate it automatically, and keep provenance in registries alongside images. Use Rekor for public verifiability by publishing to transparency logs, and use admission controllers to enforce verified provenance in policies. Log all provenance retrievals and verifications for audit purposes, and update signing keys regularly while revoking any compromised keys.
Related Concepts
SLSA Framework: Provenance is a key SLSA requirement at Level 2+. In-Toto: The provenance standard. Container Signing: Cosign/Sigstore for image and provenance signatures. Build Attestation: Provenance is one type of attestation. Supply Chain Security: Provenance is foundational to supply chain integrity.
Further Reading
In-Toto Documentation - Provenance framework. Sigstore Documentation - Provenance signing tools. SLSA Provenance Specification - SLSA provenance format. Rekor Transparency Log - Audit trail for provenance.
