Purpose
When hundreds of developers across dozens of teams use CleanStart images, you need governance—who owns which images, how customizations are approved, what the golden image strategy looks like, and how you prevent sprawl. This guide helps platform and security teams scale CleanStart adoption without losing control over image quality, security posture, or compliance.
Without proper governance in place, organizations quickly encounter significant challenges. You end up with 500 developers independently creating 300 different custom Dockerfiles, resulting in 200 unique base image variants scattered across your infrastructure. This fragmentation creates an untrackable security surface where understanding which systems are affected by vulnerabilities becomes nearly impossible, and compliance gaps emerge as auditors struggle to verify the lineage and approval status of your container images.
Implementing proper governance transforms this chaotic situation. With governance, you establish centralized golden images that teams build upon, enabling controlled customization through formal approval processes. This delivers consistent security posture across all teams, creates auditable image lineage that demonstrates compliance to auditors, and establishes predictable patching cadences where all teams move forward together on coordinated schedules.
The Governance Challenge at Scale
Common Patterns That Fail
Pattern 1: Free-for-all
Developer pulls image from registry.cleanstart.com→ Uses it directly in production→ No version pinning→ No customization review→ No upgrade coordination→ Security incident: "which version was that?"Pattern 2: Siloed golden images
Platform team maintains golden images→ App teams ignore them→ Create their own variants anyway→ Compliance audit: "approved by whom?"→ You now maintain 2 sets of imagesPattern 3: Uncontrolled customization
Team adds system packages to CleanStart image→ 6 months later: a dependency becomes EOL→ No one knows which teams use it→ Vulnerability hits: massive blast radius→ Emergency patching across hundreds of deploymentsWhat Enterprise Governance Looks Like
Enterprise governance follows a structured workflow. CleanStart publishes a new image version (v1.2.3), which triggers a verification process. The platform team pulls the image and validates its signature and software bill of materials (SBOM). Next, acceptance tests are run against your organization's actual application stacks. If the tests pass, the image is promoted to your internal registry with the governance tag "golden-v1.2.3". From this point forward, all application teams are required to consume only these golden images. Any customizations must be submitted as pull requests and undergo formal security review. Finally, compliance team auditing demonstrates that 99% or more of production images are approved variants of golden images, providing strong evidence of governance and security control.
Golden Image Strategy
What is a Golden Image?
A golden image is a CleanStart base image that your organization has undergone a comprehensive vetting process to establish as an approved, supported base. This process begins by pulling the base image from registry.cleanstart.com, ensuring you're working with an authentic, signed source. Once obtained, the image is verified through rigorous validation including signature checks using cosign, detailed SBOM inspection to understand all software components, and provenance verification to confirm the image was built through trusted processes. Next, the image undergoes acceptance testing against your organization's actual application stacks to ensure compatibility and performance meet your requirements. After successful testing, the platform and security teams formally approve the image based on their validation results and confidence in its production readiness. The approved image is then mirrored to your internal container registry, ensuring availability for air-gap deployments and providing SLA guarantees independent of external registry availability. The image is tagged with your internal versioning scheme using tags like golden-v1.2.3 for specific versions and golden-stable to indicate the currently recommended version. Finally, the image is published to all application teams along with release notes documenting what's new, any breaking changes, and migration guidance.
Once a golden image is established, app teams can only build from golden images according to your governance policy. Teams cannot pull directly from registry.cleanstart.com; instead, all image pulls are restricted to your internal registry, with this restriction enforced at the Kubernetes admission controller level to prevent policy violations.
Benefits of Golden Images
Golden images provide organizations with a single source of truth for base image selection, eliminating confusion about which images are approved and recommended for production use. They enable upgrade coordination across your entire organization, allowing all teams to move forward together on a predictable schedule rather than fragmenting into dozens of independent upgrade cycles. Golden images also provide crucial compliance evidence—your audit trail can demonstrate that all production images derive from approved bases, which satisfies stringent regulatory requirements and passes security audits. When security incidents occur, golden images give you immediate incident response capabilities; if a CVE impacts a base image, you know exactly which deployments are affected because they all share a common lineage. From an operational perspective, golden images improve performance since teams no longer need to individually build, verify, or scan base images—the platform team does this work once for all teams. Finally, golden images reduce costs through decreased image builds and faster deploy times as teams skip redundant verification steps.
Golden Image Lifecycle
The golden image lifecycle spans several months with clearly defined phases. In Month 1, CleanStart releases a new version (v1.2.0) and the platform team begins acceptance testing, which typically takes one week. By the end of Month 1, the image is approved, mirrored to the internal registry with the tag "golden-stable", release notes are published, and teams are notified via communication channels like Slack or email. During Months 2-4, application teams gradually adopt the golden-stable version in development environments, then staging environments.
In Month 5, CleanStart releases another version (v1.3.0) and the platform team repeats acceptance testing. By the end of Month 5, v1.3.0 is promoted to "golden-stable" (becoming the new standard), v1.2.0 is re-tagged as "golden-legacy" with a 6-month support window, and teams are notified with a migration deadline of Month 11.
Finally, in Month 11, the v1.2.0 image is removed from the internal registry, and any remaining production deployments using the deprecated version trigger automated alerts to force migration.
Internal Registry Structure
Your internal container registry (internal-registry.company.com) is organized into three main sections. The golden/ directory contains all approved base images, organized by language or runtime. Within the Python section, you have stable versions like 3.12-stable and legacy versions for deprecation windows. The node directory contains Node.js runtimes (versions 20 and 18), and the java directory contains Java runtimes (versions 21 and 17). Each entry points to specific image digests for reproducibility.
The team-approved/ directory contains application-specific images that have been approved for production use. These are built from golden images but may include team-specific customizations. Examples include payment team's checkout-api, search team's indexer-worker, and data team's ml-pipeline.
The staging/ directory is reserved for images under evaluation. For instance, a Python 3.13 candidate image may be tested here before promotion to golden status. This structured approach ensures clear separation between production-ready images, approved custom variants, and experimental images.
Tag convention:
Your tagging strategy should follow three main conventions for clarity and traceability. Use golden-v{MAJOR}.{MINOR}.{PATCH} tags for creating immutable records of specific versions that never change once released. Use the golden-stable floating tag to indicate the latest recommended version, which moves forward with each new promotion. Use golden-legacy to mark the previous version during its deprecation window, helping teams understand which image is no longer the standard.
Digest pinning:
Every Dockerfile should reference images by their content digest rather than by tag. This ensures reproducible builds where the exact same image is used every time the build is run, regardless of floating tag changes. Dockerfile references should use the format:
FROM internal-registry.company.com/golden/python:3.12-stable@sha256:abc123def456...Never use floating tags in Kubernetes manifests in your production environments. CI/CD builds should pin digests to guarantee that the same image that was tested in staging is deployed to production. This practice prevents the scenario where a floating tag is updated between build and deployment, potentially introducing untested changes into production.
Ownership Model
RACI Matrix
Who is Responsible, Accountable, Consulted, Informed for each activity?
Activity | Platform Team | Security Team | App Team | CTO Office | Finance |
|---|---|---|---|---|---|
Golden image selection | R/A | C | I | I | — |
Acceptance testing | R | C | I | — | — |
Customization approval | C | A | R | I | — |
Vulnerability response | R/A | C | I | I | I |
License compliance | C | R/A | I | I | — |
Image deprecation | R/A | C | I | C | — |
Cost tracking | C | — | — | I | A/R |
Legend: R (Responsible) indicates who does the work, A (Accountable) is the person who makes the final decision and owns the outcomes, C (Consulted) means the person gives input before the decision, and I (Informed) means the person gets updates after the decision is made.
Platform Team: Detailed Responsibilities
Golden Image Pipeline
The platform team owns the evaluation and promotion of new CleanStart releases into your production golden image set. This evaluation should happen at least monthly, or immediately when CleanStart publishes security patches that address critical vulnerabilities. As part of the pipeline, the team runs comprehensive acceptance tests against your actual application stacks to validate that the new image works correctly in your specific environment. All test results, compatibility issues, and any edge cases encountered should be documented for the record. Once testing passes, the approved images are mirrored to your internal container registry with predictable, consistent tag names that follow your conventions. Finally, release notes are published for each new golden image, documenting what's new in the version, any breaking changes from the previous version, and providing clear migration guidance for teams planning to upgrade.
Release Management
The platform team maintains a 3-version support window at all times, meaning you actively support the current golden image plus the previous 2 versions. This allows teams to have time to upgrade without feeling rushed. When moving to a new golden image, the team provides a full 6-month deprecation notice before removing old images entirely, giving teams substantial time to plan and execute their migrations. Throughout the deprecation window, deadlines are communicated consistently via multiple channels including email, Slack announcements, and team meetings to ensure visibility. The team also tracks adoption metrics throughout the deprecation window, monitoring which teams are still using legacy versions so they can provide additional support to those teams as deadlines approach.
Support & Documentation
The platform team maintains a public internal wiki documenting the complete governance policy, making it easy for teams to understand the rules and expectations. The team actively answers questions from application teams about customization options, helping teams understand what they can and cannot do with golden images. Example Dockerfiles are provided for common use cases (adding a Python package, installing a system tool, etc.), reducing the barrier to adoption. The team runs monthly office hours or maintains an active Slack channel where teams can ask adoption questions in real-time, ensuring blockers are resolved quickly.
Incident Response
The platform team monitors CleanStart security advisories and CVE announcements continuously, staying informed about vulnerabilities that might affect golden images. When a vulnerability is discovered, the team assesses the impact—which golden images are affected, which teams' deployments would be impacted, and how urgent the fix is. If a critical vulnerability is discovered, the team can fast-track a patched version through the golden image pipeline, potentially completing testing and promotion in days rather than weeks. Once a patch is promoted to golden, affected teams are notified immediately so they can prioritize the upgrade.
Security Team: Detailed Responsibilities
Image Verification
The security team performs thorough validation on all candidate golden images before they can be promoted to production. This validation includes signature checks using cosign to cryptographically verify that the image came from a trusted source and hasn't been tampered with. The team inspects software bills of materials (SBOMs) in formats like SPDX 3.0 and CycloneDX 1.4, analyzing the licensed software components for any license compliance concerns and cross-referencing known vulnerability databases. The team verifies provenance information to confirm the image was built through trusted CI/CD processes meeting SLSA Level 4 standards. Finally, the team checks for unusual changes between versions, understanding what changed and whether those changes introduce new security considerations.
Customization Review
When application teams submit pull requests proposing new customizations—such as adding system packages or modifying security contexts—the security team carefully reviews these requests. The review goes beyond simply checking if a package exists; the team challenges the necessity of the request by asking "Do we really need this? Is there a better approach?" This critical mindset prevents teams from adding unnecessary dependencies that expand the attack surface. The team only approves customizations once they thoroughly understand the business justification, ensuring each addition serves a real operational need. Importantly, approval reasons are documented in PR comments, creating an audit trail that explains why each customization was approved.
Compliance & Audit
The security team generates regular reports tracking the percentage of production images using approved golden images, a key metric demonstrating governance effectiveness. When compliance auditors visit, the team can point to this data as evidence of control. The team audits customization requests across your infrastructure, documenting who approved what customization and the business justification for each approval. This creates comprehensive audit trails required by compliance frameworks like SOC 2 and ISO 27001. The team maintains a living glossary of approved third-party packages and versions, making it easy to determine whether a newly-requested package has already been approved elsewhere or if it requires fresh evaluation.
Vulnerability Response
When a CVE (Common Vulnerabilities and Exposure) impacts a golden image, the security team immediately assesses the severity and urgency of the vulnerability. Based on this assessment, the team recommends a patching timeline—should this be patched today because it's actively exploited? This week because it's serious? This month because it's low-risk? These timelines are then communicated clearly to application teams so they can prioritize their own upgrades accordingly. Throughout the remediation period, the team tracks progress until all affected deployments have been updated, ensuring no team is accidentally left vulnerable.
App Team: Detailed Responsibilities
Golden Image Consumption
Application teams must commit to using only golden images from your internal registry as the foundation for their Dockerfiles. This restriction ensures your organization maintains governance and visibility over all container image bases. When deploying to Kubernetes, images must be pinned by their content digest rather than using floating tags, ensuring that the exact image tested in staging is deployed to production. Teams should actively subscribe to platform team announcements about image updates through Slack, email, or whatever communication channels your organization uses, staying informed about new golden image releases and upcoming deprecation deadlines. Importantly, teams should plan their image upgrades to occur during regular maintenance windows rather than treating them as emergency changes, allowing time for testing and coordinated deployment.
Customization Requests
When an application needs system packages or configuration modifications beyond what the golden image provides, teams submit pull requests to the image customization repository. Each customization request must include clear justification explaining what problem the customization solves and why it's necessary. If the customization is tied to a feature or bug fix, the PR should link to the relevant Jira ticket, making it easy for reviewers to understand the business context. Once the platform and security teams approve a customization request, the application team updates their Dockerfile to reference the new approved image variant, ensuring they're consuming the customized image rather than continuing to use the base golden image.
Testing & Validation
Before adopting new golden image versions in production, application teams run their complete test suite against the new images in development and staging environments. This validation confirms the application still functions correctly with the new base image. If any incompatibilities are discovered during testing, teams immediately report these to the platform team along with clear reproduction steps, helping the platform team identify and potentially fix issues before the image is widely adopted. When golden images change significantly, teams update their CI/CD pipelines to build against the new versions. For major version upgrades—such as moving from Python 3.11 to Python 3.12—teams allocate dedicated testing time, recognizing that these changes may require code modifications or dependency updates.
Monitoring & Alerts
Application teams actively monitor their production deployments for outdated images, using tools like CleanSight when available to get visibility into which images are running. When the platform team announces a golden image refresh or deprecation deadline, teams proactively update their deployments to adopt the new versions rather than waiting until the deadline passes. Teams report security findings to the platform team when they discover vulnerabilities or issues related to golden images, helping the platform team stay aware of production impact. Finally, teams participate in compliance audits when requested, demonstrating that their deployments use approved images and follow governance requirements.
Customization Governance
Three Tiers of Customization
Tier 1: Allowed (No Approval)
Tier 1 customizations are application-level changes that don't affect the security properties of the base image and require no approval from the security or platform teams. These customizations are entirely within the scope of what application teams can do in their own Dockerfiles.
Application teams can freely add application code including new files, scripts, and configurations specific to their service without any restriction. Installing language-specific packages is permitted, whether that's Python packages via pip install, Node packages via npm install, Go modules via go get, or equivalent package managers for other runtimes. These language-level packages run at build time inside the container and modify the application layer without affecting the underlying OS. Teams can set environment variables for application configuration, allowing each team to customize how their app behaves in different environments. Configuring health checks such as Kubernetes liveness and readiness probes is allowed, ensuring your applications can be properly monitored and restarted if they become unhealthy. Teams can set arbitrary labels and annotations on Kubernetes resources to enable organization and tooling integration. Creating application-level directories like /app, /data, /config, or other paths specific to your application is permitted. Finally, installing language runtime plugins—such as Python wheels, npm modules, or framework-specific extensions—is allowed as these operate within the language runtime rather than modifying system packages.
Example Dockerfile:
FROM internal-registry.company.com/golden/python:3.12-stable@sha256:abc123... # Tier 1: no approval neededCOPY requirements.txt /app/RUN pip install -r /app/requirements.txt COPY app.py /app/ENV APP_ENV=productionHEALTHCHECK --interval=30s CMD curl -f http://localhost:8000/health || exit 1Tier 2: Controlled (Approval Required)
Tier 2 customizations modify the system image in ways that affect security, compliance, or infrastructure support, requiring formal review and approval from the security and platform teams.
Installing system packages from clnpkgs.clnstrt.dev is controlled, whether that's apk add curl to add network tools or apk add postgresql-client to enable database migrations. While these packages are necessary for some applications, each addition expands the attack surface and potential maintenance burden, so new packages are reviewed for necessity and security implications. Modifying security contexts is controlled, including changes that alter the security policy, drop or add Linux capabilities, or disable the read-only filesystem—these changes have direct security implications and must be justified. Changing the user or group that runs the container, particularly moving to a different non-root user rather than the default, is controlled because it affects process permissions and security boundaries. Adding TLS certificates—whether root CA certificates for trust chains or leaf certificates for mutual authentication—is controlled because certificate management affects security and compliance. Setting resource limits for memory and CPU at the Kubernetes level is controlled since these affect infrastructure capacity planning and cost allocation. Installing FIPS modules on non-FIPS images is controlled since cryptographic configuration has compliance implications for certain regulated environments. Configuring logging sidecars or monitoring agents is controlled since it affects how operational data is collected and which external services have access to your container data. Finally, adding image metadata like build dates, version information, and maintainer details is controlled to ensure consistency and traceability across your image fleet.
Approval Process:
The approval process for Tier 2 customizations is straightforward and transparent. First, you submit your Dockerfile as a pull request to the central image management repository, where it will be reviewed by the governance team. Include detailed justification in the PR description explaining why you need the package or configuration change—what problem does it solve and why can't you solve it without this change? The security team reviews the request from a security and compliance perspective, asking whether the package is truly necessary, whether safer alternatives exist, and whether the change creates any compliance concerns for regulated workloads. The platform team reviews the request from an operational perspective, considering infrastructure implications, version compatibility with other packages, and potential interactions with existing components. Once both teams are satisfied, an approval comment is added by the @platform-team reviewer, signaling that the change is approved. After approval, the Dockerfile is merged into the repository and built by the CI/CD pipeline. Finally, the new custom image is tagged with an appropriate version number and made available to your team for deployment in your applications.
Example PR:
Title: Add postgresql-client for data migrations Description:Our data pipeline needs to run pg_dump and psql commands.Postgres client library is needed in the image. Package: postgresql-client (v15.1 from clnpkgs.clnstrt.dev)Security impact: Adds ~80MB to image, no SUID binariesCompliance: Client-only, no server componentsAlternatives: Use external migration tool? Would add complexity. @security-team @platform-team please reviewTier 3: Prohibited (Never Allowed)
Tier 3 customizations violate core CleanStart security properties and cannot be approved under any circumstances. Attempting these customizations indicates a fundamental misalignment with CleanStart's security model, and the platform team should be contacted to discuss architectural alternatives.
Installing a shell such as bash, sh, or dash is prohibited because it directly defeats CleanStart's shell-less design philosophy, which eliminates entire categories of container escape and privilege escalation attacks. Changing to the root user (UID 0) is prohibited because it defeats the non-root design that constrains the damage possible if the application is compromised. Disabling the read-only filesystem through commands like RUN mount -o remount,rw / is prohibited because it defeats the immutability design that prevents runtime modifications to system files. Adding packages from non-CleanStart repositories—arbitrary APK indexes, third-party pip repositories, or other external sources—is prohibited due to supply chain security risks; you must verify and approve all packages through CleanStart's repository. Removing security hardening mechanisms is prohibited, whether that's dropping the seccomp profile that limits dangerous system calls or disabling AppArmor protections; these are fundamental layers of defense. Installing kernel modules is prohibited because containers should be pure user-space applications; kernel modifications bypass container isolation. Adding services that auto-start, such as init systems, daemons, or background processes, is prohibited because containers should be stateless and transient; application servers should be the only long-running processes. Changing the network stack to enable raw socket access or custom iptables rules is prohibited because this represents an infrastructure concern that violates container boundaries. Disabling image verification by removing cosign validation checks from the build process is prohibited because this is a critical supply chain security control that you cannot bypass.
If you believe you need something from Tier 3, contact the platform team immediately rather than attempting to work around the restrictions. The platform team will help you understand the business need and identify architectural alternatives that meet your requirements while maintaining CleanStart's security properties.
Customization Request Process: Step by Step
Step 1: Discuss in #image-customization Slack channel
Before creating a PR:
@platform-team: Hi, we need postgresql-client in our Python images.Our data service needs to run migrations. Is this something we can add?Platform responds with guidance, known issues, version recommendations.
Step 2: Create PR with Dockerfile + justification
Repo: internal-platform/golden-imagesBranch: feature/add-postgres-client Dockerfile:FROM internal-registry.company.com/golden/python:3.12-stable@sha256:abc123...RUN apk add --no-cache postgresql-client... PR Template (auto-filled):- **What**: Adding postgresql-client v15.1- **Why**: Data migrations need pg_dump/psql- **Risk**: None, client-only package- **Testing**: Tested locally, works with our migration scripts- **Team**: data-platform- **Jira**: INFRA-1234 Labels: tier-2-controlled, python, database-clientAssignees: @platform-team, @security-teamStep 3: Automated checks run
A suite of automated checks runs immediately when the PR is submitted, providing quick feedback without requiring human review. The cosign verification check confirms that the base golden image has a valid signature, preventing use of unverified images. The SBOM scan analyzes the software bill of materials of the new package, checking for any known CVEs that would make the package unsuitable for production. The size check ensures the resulting image still fits within your organization's size target of less than 500MB, preventing image bloat. The license scan verifies that the postgresql-client package's license is compatible with your organization's policies and existing dependencies. The build test actually builds the Dockerfile, ensuring there are no syntax errors or missing steps that would prevent the image from being built successfully.
If any of these automated checks fails, the PR is automatically blocked and cannot proceed. The maintainer of the image pipeline adds a comment explaining the failure, allowing the team submitting the PR to address the issue, such as finding a smaller package alternative or investigating the CVE in more detail.
Step 4: Security team reviews
The security team carefully evaluates the customization request from multiple security and compliance angles. They ask whether the package is truly necessary or if there's a way to achieve the business goal without adding a dependency. They check the package for any known vulnerabilities in public vulnerability databases. They examine the package's binary permissions, looking for suspicious SUID bits or capability grants that could escalate privileges unexpectedly. They verify that the package's license is compatible with your organization's policy and doesn't create conflicts with existing dependencies. They assess any compliance concerns for regulated industries—whether FIPS requirements are affected, HIPAA data handling is changed, PCI DSS controls are impacted, or other compliance frameworks matter to your organization.
Once the security team completes their review, they add an approval comment to the PR: ✅ Approved (no security concerns, aligns with policy). This signals that security has no objections and the PR can proceed to the next review stage.
Step 5: Platform team reviews
The platform team reviews the customization from an operational perspective. They check whether this same request has come from other teams before, potentially indicating this should be a standard golden image feature rather than a one-off customization. They assess whether the specific package version requested is compatible with other packages in the image and whether that version has a good support timeline. They determine whether this customization is so broadly useful that it should be added to the standard golden image, benefiting all teams rather than just the requesting team. They evaluate any infrastructure impact, such as whether the package significantly increases image size, slows down deployment, or affects container startup time.
Once the platform team completes their review, they add an approval comment: ✅ Approved (standard package, version appropriate, no conflicts). This signals that from an operational perspective, the customization is sound and the PR can be merged.
Step 6: Merge and build
Once both the security and platform teams have approved, the PR is merged to the main branch, triggering the CI/CD pipeline to build and test the new custom image. The first step builds the Dockerfile to create a container image layer. The image is then tagged with a version number following your semantic versioning scheme, for example internal-registry.company.com/team-approved/data/migrations-worker:1.0.0. A comprehensive security scan is run using tools like Trivy or Grype to detect any vulnerabilities in the built image, ensuring the new image meets your security standards. The scanned image is then pushed to your internal container registry where it becomes available to the requesting team. Finally, deployment documentation is updated to reflect the availability of the new image, allowing team members to find and reference the image in their deployment manifests.
Step 7: Team notified
Slack notification:
✅ data-platform image approved and readyImage: internal-registry.company.com/team-approved/data/migrations-worker:1.0.0@sha256:xyz...Scan results: 2 low-severity findings (acceptable)You can now use this in your deploymentsImage Lifecycle Management
Versioning Strategy
Use semantic versioning for golden images:
golden-v{MAJOR}.{MINOR}.{PATCH} Example:golden-v1.0.0 → initial releasegolden-v1.1.0 → new feature (e.g., new packages added)golden-v1.1.1 → patch (e.g., security update in existing package)golden-v2.0.0 → breaking change (e.g., major language version upgrade)Digest Pinning (Required in Production)
Never use floating tags like :latest in production Kubernetes manifests.
# ❌ WRONG: floating tag, could pull different image tomorrowapiVersion: v1kind: Podmetadata: name: my-appspec: containers: - name: app image: internal-registry.company.com/golden/python:3.12-stable # ✅ CORRECT: digest pinned, same image every timeapiVersion: v1kind: Podmetadata: name: my-appspec: containers: - name: app image: internal-registry.company.com/golden/python:3.12-stable@sha256:abcd1234ef5678...Floating tags (like golden-stable) are fine for development, but production must use digest pins.
Deprecation Process
When a golden image version is being phased out:
Months 1-3: Active support — The image is the current recommended version, receives all bug fixes and patches, and enjoys full support from the platform team.
Months 4-6: Legacy window announced — The image continues to receive only critical patches, and the platform team announces that teams should plan their migration by Month 10. Release notes are updated to include migration guidance, and documentation reflects the timeline.
Months 7-9: Final calls — Only critical security patches are applied at this stage, and alerts repeatedly notify teams that the deprecation deadline is approaching. You will see a spike in support questions from teams completing their upgrades to the current version.
Month 10+: Removal — The image is removed from the internal registry, old deployments trigger alerts, an emergency migration process is available for teams that missed the deadline, and a post-mortem is conducted on teams that failed to migrate on time.
Example deprecation timeline (real dates):
Jan 2024: Release python:3.11-golden-v1.2.0 (new current) Announce: python:3.10-golden-v0.8.0 entering legacy window Jan 2024-June: Migration window (6 months) Teams test 3.11 in dev, stage in staging June 2024: python:3.10 image still available but tagged "legacy" Only critical security patches (not feature patches) July 2024: Reminder emails sent to teams still on 3.10 August 2024: Internal dashboard shows: - 45 teams still on 3.10 (120 production pods) - 8 teams on 3.11 (950 production pods) - Flag: only 1 month until removal September 2024: Final push - Direct outreach to 8 holdout teams - Help with migration October 2024: python:3.10 image deleted - 45 teams migrated (deadline squeeze) - 2 teams running emergency migration (pain) - Zero teams still on 3.10 Lesson: Announce 6 months in advance, but you'll always have stragglerswho migrate in month 5-6. That's normal.End-of-Life Policy
Support window: 18 months from release - Months 0-12: active support (features + patches) - Months 12-18: legacy (security patches only) - Month 18+: unsupported (image archived, not available) Example: v1.0.0 released Jan 1, 2024 Support ends July 1, 2025 Image archived Oct 1, 2025Archival and Retention
Your retention strategy should distinguish between what must be kept permanently for audit and compliance purposes versus what can be safely removed to keep your infrastructure lean.
You should keep permanently for every version: the software bill of materials (SBOM) documenting all components in the image, providing permanent evidence of what was included; scan results showing what vulnerabilities were found at build time, creating a historical record of security posture; release notes documenting what changed between versions and any migration guidance provided; and the image digest and cryptographic signature proving the image's authenticity and integrity. These records together form an audit trail demonstrating your governance and decision-making processes.
You can remove after end-of-life to save storage space: Docker image layers from the registry itself can be deleted after the image reaches EOL, since these layers are large and the image is no longer used; CI/CD build artifacts from your continuous integration pipeline can be deleted after 1 year, as they're primarily useful during active development and troubleshooting of recent builds; and test results can be deleted after 2 years, providing time for post-mortem analysis while not consuming storage indefinitely. This retention strategy keeps your registry lean and your storage costs manageable while maintaining the audit trails necessary for compliance and governance enforcement.
Metrics and Reporting
Key Performance Indicators (KPIs)
1. Golden Image Adoption Rate
Metric: What % of production image deployments use golden images?
Formula: (Deployments using golden images) / (Total deployments) × 100 Target: >95%Current: 87% (need to improve) Dashboard: Golden image deployments: 1,240 Non-golden deployments: 185 - 120 using images directly from registry.cleanstart.com (non-mirrored, risky) - 40 using custom builds (no platform oversight) - 25 using third-party base images (not CleanStart)Remediation for low adoption:
If your golden image adoption rate falls below the 95% target, the first step is to identify which teams are not using golden images and understand the underlying blockers. Some teams may simply not be aware that golden images exist or understand the governance requirements. Other teams may have tested the golden images and found compatibility issues with their specific applications. Still others may be waiting for additional support or documentation to understand how to migrate. By directly engaging with non-compliant teams, you can identify and remove blockers, whether that's through providing technical assistance with migration, updating documentation to address common questions, or adjusting governance policies if you discover the policies are unreasonably strict for certain use cases.
2. Mean Time to Promote (MTTP)
Metric: How long does it take from CleanStart release to golden image promotion?
Formula: Average time from CleanStart release date to internal golden promotion Target: 7 days (1-week acceptance testing)Current: 12 days (slow, blockers?) Timeline for latest release: CleanStart releases v2.1.0: Jan 1 Platform team pulls & runs tests: Jan 1-5 Tests pass: Jan 5 Promoted to golden: Jan 7 (2 days after tests pass, admin delay) Teams notified: Jan 7Remediation for slow MTTP:
If your mean time to promote is exceeding the target, there are several approaches to accelerate the process. You can invest in additional acceptance test infrastructure, allowing multiple tests to run in parallel rather than sequentially, dramatically reducing overall testing time. You can reduce administrative overhead by automating the promotion process—once tests pass, automatically tag and mirror the image to your internal registry rather than requiring manual promotion steps. You can also pre-stage release candidates, pulling new CleanStart images and running initial smoke tests before the image is officially released, allowing you to complete much of the testing work before the official release announcement.
3. Customization Request Approval Time
Metric: How long does it take to approve a Tier 2 customization request?
Formula: Average time from PR submission to approval Target: 2 business daysCurrent: 3.5 business days (okay, but could be faster) Breakdown: - Time to first review: 1 day (good, reviewers are responsive) - Time to security approval: 1 day (standard) - Time to platform approval: 1 day (standard) - Time from approval to merge: 0.5 days (CI/CD or manual build delay)Remediation for slow approvals:
If customization request approval times are exceeding your target, several tactics can help. Recruiting more reviewers from your security and platform teams distributes the review load, reducing the time each PR sits waiting for review. Developing clearer approval criteria helps reviewers feel more confident in their decisions; when approval criteria are ambiguous, reviewers take longer to review because they're uncertain whether they should approve. Building an async review culture where reviews happen throughout the day rather than requiring all reviewers to be online simultaneously helps PRs move forward more consistently without waiting for specific team members to be available.
4. Non-Compliant Image Count
Metric: How many production deployments are using non-golden images?
Current: 185 deployments not using golden imagesBreakdown: - 120 direct from registry.cleanstart.com (risky, no mirroring SLA) - 40 custom builds (unclear lineage, potential supply chain gaps) - 25 non-CleanStart base images (outside governance) Risk assessment: If a vulnerability hits registry.cleanstart.com but not internal registry, 120 deployments won't get patched automaticallyRemediation:
To address non-compliant images in production, start with technical enforcement via admission controller policies that reject any Kubernetes resource trying to use images from outside your internal registry. However, enforce this gradually through a phased rollout approach—begin by warning teams about non-compliant images without blocking deployment, giving them time to respond. Once teams are aware and have had time to migrate, enable hard enforcement that actively blocks non-compliant deployments. Throughout this process, help non-compliant teams understand why governance matters by explaining the security and compliance benefits and moving them to golden images with direct technical support when needed.
5. Image Sprawl Index
Metric: Estimate of image maintenance overhead.
Formula: (Unique golden image versions in use) / (Total supported versions) Example: Supported versions: python 3.12, 3.11, 3.10 (3 versions) In production: python 3.12-v2.1, 3.11-v1.8, 3.10-v0.8 (3 versions) Sprawl index: 100% (you support exactly as many as teams use) Versus: Supported versions: 3 versions In production: 3.12-v2.1, 3.12-v2.0, 3.12-v1.9, 3.11-v1.8, 3.11-v1.7 (5 versions) Sprawl index: 166% (teams hanging onto old patches, increasing support burden)Target: Sprawl index <110% (slightly ahead of usage, healthy deprecation)
Remediation for high sprawl:
When your sprawl index exceeds the target, indicating you're supporting more image versions than teams actually need, several tactics can help reduce the overhead. You can accelerate the deprecation timeline, forcing teams to migrate from old versions more aggressively rather than allowing them to linger indefinitely. You can improve communication about end-of-life dates by making the timeline more visible in your regular updates and team communications, ensuring teams don't miss deprecation deadlines through miscommunication. You can provide automated migration support tooling that helps teams automatically update their Dockerfiles and deployment manifests to use current golden images, removing the manual effort required for migration.
Monthly Dashboard
Template for reporting to stakeholders:
Monthly Report: CleanStart Golden Image Governance Golden Image Adoption: 92% (+3 points from last month) 1,250 compliant deployments 110 non-compliant (action plan in place) Promotion Efficiency: 8 days average MTTP Last promotion: python 3.12-v2.1.0 (7 days, on target) Next candidate: java 21-v1.8.0 (in acceptance testing) Customization Pipeline: 6 active PRs, 3 approved this month Approval time: 2.1 days average (on target) Categories: 3 database clients, 2 monitoring tools, 1 build tool Security Posture: Zero critical CVEs in golden images this month 1 high-severity patch planned for next week Compliance audit: 99.2% of images verifiable Cost Impact: Registry storage: 85 GB (3% of budget) Acceptance testing: 40 compute hours (minimal) Admin overhead: ~2 FTE equivalent (platform team) Upcoming: - Node.js 20 LTS promotion (on track for next week) - Security team office hours: Wednesday 2pm PT - Deprecation deadline for python 3.10: August 30Policy Enforcement
Admission Controllers
Use Kubernetes admission controllers to enforce golden image policy at the cluster level.
Option 1: Kyverno (Recommended for Kubernetes)
apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-golden-imagespec: validationFailureAction: enforce rules: - name: check-image-registry match: resources: kinds: - Pod - Deployment - StatefulSet validate: message: "Image must be from internal golden registry" pattern: spec: containers: - image: internal-registry.company.com/golden/*This policy rejects any Kubernetes resource trying to use images from registry.cleanstart.com directly, Docker Hub, or any other non-golden registry.
Option 2: OPA/Conftest (Language-agnostic)
# policy/golden_image.regopackage kubernetes deny[msg] { container := input.spec.containers[_] not startswith(container.image, "internal-registry.company.com/golden/") msg := sprintf("Image must use golden registry: %v", [container.image])} # Allowed exceptions (team-approved custom images)allow_custom_image { input.metadata.namespace in ["payments", "search", "data"] input.metadata.annotations["approved-custom-image"] == "true"}CI/CD Pipeline Gates
In your deployment pipeline (GitHub Actions, GitLab CI, etc.):
deploy: stage: deploy script: # Extract base image from Dockerfile - BASE_IMAGE=$(grep "^FROM" Dockerfile | awk '{print $2}') # Check if it's a golden image - | if [[ "$BASE_IMAGE" == *"internal-registry.company.com/golden/"* ]]; then echo "✅ Using golden image: $BASE_IMAGE" else echo "❌ Must use golden image from internal registry" echo " Found: $BASE_IMAGE" exit 1 fi # Verify digest is pinned (no floating tags in prod) - | if [[ "$BASE_IMAGE" == *"@sha256:"* ]]; then echo "✅ Image digest pinned" else echo "⚠️ Warning: image digest not pinned (okay for dev, not for prod)" fi # Deploy if all checks pass - kubectl apply -f k8s/deployment.yamlRuntime Scanning
Even with admission control, scan running containers:
#!/bin/bash# scan-running-images.sh - detect non-golden images at runtime GOLDEN_REGISTRY="internal-registry.company.com/golden" kubectl get pods -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{.spec.containers[*].image}{"\n"}{end}' | while read ns pod images; do for image in $images; do if [[ ! "$image" == "$GOLDEN_REGISTRY"* ]]; then echo "⚠️ Non-golden image in $ns/$pod: $image" fi donedoneRun daily via CronJob, alert if non-golden images found.
Compliance Reporting
Generate quarterly reports for auditors:
#!/bin/bash# compliance-report.sh echo "=== Golden Image Compliance Report ==="echo "Period: Q1 2024"echo echo "## Image Adoption"echo "Golden images in production: $(kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | grep "internal-registry.company.com/golden" | wc -l)"echo "Non-golden images: $(kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | grep -v "internal-registry.company.com/golden" | wc -l)"echo echo "## Customization Approvals"git log --oneline --since="2024-01-01" --until="2024-03-31" --all -- images/ | wc -lecho "customizations approved"echo echo "## Security Patching"echo "Critical patches applied: 2"echo "High-severity patches applied: 8"echo "Mean time to patch: 4.2 days"echo echo "## Audit Trail"echo "All golden image builds are signed with cosign"echo "All customizations reviewed by security team"echo "All production deployments verified by admission controller"Migration Path: No Governance → Governance
If you're starting without governance, a phased approach helps you build governance gradually while minimizing disruption to teams.
Phase 1: Establish (Week 1-2)
Begin by publishing the governance policy document—use this document or adapt it to your organization's specific requirements. Announce the new governance approach to all teams through company-wide communication channels, making sure leadership supports the initiative. Simultaneously, set up the technical infrastructure: create a golden image repository (in Git or your preferred version control), establish CI/CD pipelines for building and testing golden images, and create the first set of golden images covering your most common runtimes like Python, Node.js, and Java. This gives teams immediately usable alternatives and demonstrates that governance is real.
Phase 2: Educate (Week 3-4)
Run team office hours where teams can ask questions about the policy and get clarification in real-time. Create example Dockerfiles showing how to properly consume golden images for common use cases—these concrete examples are far more useful than abstract policy documents. Publish a migration guide explaining step-by-step how teams move from whatever they're currently doing to consuming golden images. Share success stories from early adopters within your organization, demonstrating that golden images work and don't disrupt teams who embrace them. These early wins build credibility and encourage other teams to migrate.
Phase 3: Incentivize (Week 5-6)
Highlight golden image adoption metrics in your company updates and executive communication, making visible the progress toward governance. Recognize and praise teams that migrate early, whether through team mentions in company updates or other recognition mechanisms. Actively offer technical help to teams encountering blockers, positioning the platform team as a resource enabling migration rather than an obstacle creating more work. Clearly communicate the security and compliance benefits of governance, helping teams understand why this effort matters beyond "because the policy says so."
Phase 4: Enforce (Week 7+)
Deploy Kubernetes admission controller policies using Kyverno or OPA to enforce image governance. Start in warning mode where violations generate alerts but don't block deployments, giving teams time to discover non-compliant images and migrate. Run compliance scans regularly and report non-golden images to teams, keeping enforcement visible. Work collaboratively with teams to remediate violations rather than punishing non-compliance. After teams have had sufficient time to adapt—typically 4-8 weeks—gradually increase enforcement from warnings to hard blocks, ensuring teams understand this transition is coming and have been given ample opportunity to comply.
Timeline to full adoption: 3-6 months in practice, though the timeline varies significantly based on your organization's size, the number of teams needing to migrate, and how quickly teams can schedule their own update cycles.
What to Read Next
For testing guidance, consult the Security Testing Playbook to verify CleanStart properties in QA environments. For Kubernetes operations, refer to the Kubernetes-Helm Operations guide to deploy governed images at scale. For operational security, review the Vulnerability Response SLA to understand how to handle CVEs in golden images. For reference information, consult the Compatibility Testing Matrix to understand what's officially supported.
