Related Fundamentals: For a deeper understanding of container image structure and design, see Development Images vs Application Images.
The Principle
Development and production containers must be built from completely different specifications.
A development image should have tools for debugging, iteration, and experimentation. A production image should have nothing except what's required to run the application.
This principle is so fundamental that CleanStart builds them as separate, immutable artifacts with no shared configuration.
In the traditional approach, a single Dockerfile is used for both development and production. The image is built from a full Ubuntu base, installs Node.js, then adds development tools like build-essential, gdb, and debugging utilities like curl and wget. While this approach is simpler and requires less configuration, it has a critical flaw: the production image contains all the development tools, creating a larger attack surface, additional CVEs, and unnecessary bloat.
CleanStart takes a fundamentally different approach by building two completely independent images. The development image is built from Node.js 18, includes build-essential and gdb for debugging, installs all dependencies including dev dependencies, and uses nodemon for hot reloading. The result is a 900 MB image optimized for developer experience with full tools available. The production image, by contrast, is built from distroless/nodejs18, copies only the compiled artifacts from the builder stage, and includes no tools or development dependencies. The result is a lean 187 MB image with maximum security and no unnecessary attack surface.
The key principle: development and production containers must be built from completely different specifications.
Development Image Specification
The development image is optimized for developer experience:
# development.dockerfileFROM node:18 WORKDIR /app # Development utilitiesRUN apt-get update && apt-get install -y \ curl \ wget \ git \ vim \ nano \ build-essential \ gdb \ strace # Install dependencies including devCOPY package*.json ./RUN npm ci --include=dev # Development entrypoint (auto-restart on changes)RUN npm install -g nodemon EXPOSE 3000 # Start in development modeCMD ["nodemon", "--inspect=0.0.0.0:9229", "src/server.js"]Development Image Capabilities
The development image comes equipped with a comprehensive suite of debugging and development capabilities designed to support rapid iteration and efficient problem-solving. Memory debugging is fully supported through the Node inspector running on port 9229, which integrates with Chrome DevTools to enable interactive debugging of running code. Developers can capture heap snapshots to analyze memory usage patterns and perform detailed performance profiling to identify bottlenecks. The image includes extensive system introspection tools that allow deep investigation into application behavior. The strace tool intercepts and logs all system calls, making it possible to see exactly how the application interacts with the operating system. The gdb debugger provides full symbolic debugging capabilities with breakpoints, stack traces, and variable inspection. The ltrace tool traces library calls, showing which standard library functions the application is invoking. The perf utility enables performance profiling at the kernel level, identifying CPU hotspots and cache misses.
Network debugging capabilities are equally comprehensive. Curl and wget are included for making HTTP requests and testing connectivity. Netcat provides low-level TCP/UDP testing and port verification. The dig utility performs DNS lookups and debugging, essential for understanding service discovery issues. Tcpdump captures raw network packets, allowing packet-level analysis of network communication. Development iteration is greatly enhanced by nodemon, which automatically restarts the application whenever source files change, eliminating the need for manual restart. Full shell access via bash enables developers to experiment interactively, inspect the filesystem, and run arbitrary commands for troubleshooting. Access to package managers like npm and apt allows developers to install additional tools and dependencies on demand. Complete build tools including gcc and make enable compilation of native modules and testing of build processes.
Development Environment
apiVersion: v1kind: Podmetadata: name: myapp-devspec: securityContext: runAsUser: 1000 fsGroup: 1000 # NOT read-only (needs to write for debugging) # HAS shell access (for development) containers: - name: app image: myapp:dev-latest ports: - containerPort: 3000 # Application port - containerPort: 9229 # Node inspector volumeMounts: - name: app-code mountPath: /app # Code mounted from developer's machine env: - name: NODE_ENV value: "development" - name: DEBUG value: "*" securityContext: allowPrivilegeEscalation: true capabilities: add: ["ALL"] # Unrestricted for development volumes: - name: app-code hostPath: path: /path/to/app # Developer's local codeProduction Image Specification
The production image is optimized for security and efficiency:
# production.dockerfileFROM gcr.io/distroless/nodejs18-debian11 WORKDIR /app # Copy only what's needed (pre-compiled, pre-tested)COPY --from=builder /app/dist ./distCOPY --from=builder /app/node_modules_prod ./node_modulesCOPY package.json ./ EXPOSE 3000 # Run as non-root, no shell, read-only root filesystemUSER 1000:1000 CMD ["/nodejs/bin/node", "dist/server.js"]Production Image Properties
The production image enforces multiple security properties that eliminate entire attack surfaces. It has no shell (neither bash nor sh available), no package manager (apt, npm unavailable), and no build tools (gcc, make unavailable). The container runs as a non-root user (UID 1000), contains only minimal dependencies needed for runtime, and operates with a read-only root filesystem.
Size optimization is achieved through the distroless base image (2 MB), complete absence of dev dependencies, no build tools included, and stripped binaries. This results in a total image size of 187 MB, compared to the 900 MB development image.
Security hardening is enforced at multiple levels. The absence of a shell eliminates interactive access, preventing attackers from executing commands. The lack of tools means attackers cannot install exploits or create persistence mechanisms. The read-only filesystem prevents runtime modification of code. Running as non-root limits the scope of privilege escalation attacks. And the distroless base image minimizes the CVE surface area compared to full OS distributions.
Production Environment
apiVersion: v1kind: Podmetadata: name: myapp-prodspec: securityContext: readOnlyRootFilesystem: true # ← Production hardening runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 allowPrivilegeEscalation: false containers: - name: app image: myapp:1.0.0@sha256:abc123... # ← Pinned digest ports: - containerPort: 3000 volumeMounts: - name: tmp mountPath: /tmp - name: var-run mountPath: /var/run securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] # Drop all capabilities add: ["NET_BIND_SERVICE"] # Only what's needed livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 5 resources: limits: cpu: "1" memory: "512Mi" requests: cpu: "500m" memory: "256Mi" volumes: - name: tmp emptyDir: {} - name: var-run emptyDir: {}Build Pipeline: From Dev to Prod
The journey from development to production involves multiple coordinated stages, each designed to ensure code quality, security, and reliability. On the developer's machine, developers work interactively inside the myapp:dev container with all development tools available, test their changes locally, and commit code to the repository. This inner loop enables fast feedback and rapid iteration without waiting for CI/CD pipelines.
When code is pushed, the CI/CD pipeline takes over with a series of automated steps. First, the pipeline builds the development image tagged with the commit hash, ensuring that every commit produces a reproducible development environment. The full test suite runs against the development image, catching bugs early in the process. The dev image is then published to the container registry for pull request testing, allowing reviewers to run the application in a containerized environment that matches the developer's setup.
Next, the pipeline compiles the application source code (for example, TypeScript to JavaScript), runs production-specific tests that verify behavior in production-like conditions, and builds the final production image (myapp:1.0.0) using only the pre-compiled artifacts. This separation ensures that the production image contains only optimized, verified code with no development dependencies or debugging tools.
Security validation is rigorous and multi-layered. Both the development and production images are scanned for vulnerabilities using industry-standard tools. Software Bill of Materials (SBOMs) documents every component in each image, and cryptographic attestations prove who built each image and how. Both images are digitally signed with deployment keys, ensuring that only authorized parties can produce valid images. These signed images are published to the registry alongside comprehensive metadata documenting their provenance and security properties.
The staging environment provides a realistic test environment before production deployment. The production image is deployed to staging to verify that it functions correctly, integration tests run against real dependencies to ensure the application behaves as expected, security properties are confirmed to be intact, and a performance baseline is established to ensure the application meets performance requirements before entering production.
Finally, in the production environment, the verified image is deployed pinned to its exact content digest (a SHA256 hash) rather than to a mutable tag. This ensures that exactly the same bits that were tested in staging are deployed to production, with no possibility of unexpected changes. Security policies are enforced at runtime through Kubernetes pod security policies and network policies. Continuous monitoring is enabled to detect any anomalies in application behavior, resource usage, or security posture, providing visibility into production operation.
Multi-Stage Build Pattern
CleanStart uses multi-stage Docker builds to implement the separation between development and production environments within a single build process. This technique allows a single Dockerfile to define multiple distinct build stages, each serving a specific purpose, and enables artifacts to be selectively promoted from earlier stages to later stages while leaving other artifacts behind.
The multi-stage approach works as follows. The development stage builds from a full-featured Node.js image, installs all dependencies including development dependencies, compiles TypeScript to JavaScript, and runs the full test suite. All the intermediate artifacts—source maps, test results, development dependencies, and build tools—remain in this stage. The production stage then starts fresh from a minimal distroless image with no build tools or development dependencies. Critically, only the compiled JavaScript artifacts (in the dist directory) and production dependencies (node_modules_prod) are copied from the development stage. The source code, package-lock.json, and any other development artifacts are explicitly not copied. The Dockerfile pattern makes this clear and intentional:
# Stage 1: DevelopmentFROM node:18 as devWORKDIR /appCOPY package*.json ./RUN npm ci --include=devCOPY . .RUN npm run buildRUN npm run test -- --coverage # Stage 2: Production (fresh, minimal base)FROM gcr.io/distroless/nodejs18-debian11 as prodWORKDIR /app # Copy ONLY compiled artifacts from dev stageCOPY --from=dev /app/dist ./distCOPY --from=dev /app/node_modules_prod ./node_modulesCOPY package.json ./ EXPOSE 3000USER 1000:1000CMD ["/nodejs/bin/node", "dist/server.js"]This pattern enforces several critical guarantees. First, development tools cannot leak into the production image because they are not copied from the development stage. Every tool in the development image—curl, wget, gcc, build-essential—remains confined to that stage. Second, the production image is rebuilt from a completely fresh base image, ensuring no accumulated state or configuration from the development stage. Third, all artifacts are pre-verified before being copied, since compilation and testing occur in the development stage before the production stage even begins. Fourth, configuration is not duplicated across stages—each stage declares its own requirements independently, making it clear what each stage needs.
Configuration Management
Dev and prod often need different configurations:
# config/development.yamlserver: host: 0.0.0.0 port: 3000 debug: true logging: level: debug format: text output: stdout cache: ttl: 10 # Short TTL for testing database: host: localhost pool: 5 security: verify_ssl: false rate_limit: disabled require_auth: false --- # config/production.yamlserver: host: localhost port: 3000 debug: false logging: level: info format: json output: stdout cache: ttl: 3600 # Long TTL for performance database: host: rds.amazonaws.com pool: 50 ssl: true security: verify_ssl: true rate_limit: enabled require_auth: true cors_origins: - https://app.example.com - https://api.example.comApplications load the appropriate config based on environment variable:
# Developmentdocker run myapp:dev-latest --env NODE_ENV=development # Productiondocker run myapp:1.0.0 --env NODE_ENV=productionGovernance Benefits
Separating development and production images as distinct, immutable artifacts enables organizations to enforce clear and distinct security and quality policies tailored to the specific needs of each environment. This separation is not merely a technical convenience—it is a governance mechanism that prevents whole classes of security incidents.
Development images are intentionally flexible and permissive, designed to maximize developer productivity and enable rapid iteration. Development images are permitted to include comprehensive debugging tools, full shell access for manual investigation, development dependencies like test runners and linters, verbose logging to aid troubleshooting, and other runtime features that aid the development process. Importantly, development images can skip some of the security hardening that production images require, allowing developers to focus on correctness and functionality rather than security constraints. Critically, development images do not require cryptographic signatures or attestations, and policies can be configured to prevent development images from being deployed to production environments. This freedom and flexibility is essential for developer productivity, as it allows developers to innovate and experiment without waiting for security reviews or compliance checks on every minor change.
Production images, by contrast, operate under strict, non-negotiable requirements designed to minimize attack surface and ensure reliability. Production images must use distroless or minimal base images with no unnecessary tools or libraries. They must not include any shell—neither bash nor sh—preventing attackers from executing arbitrary commands even if they compromise the container. They must not include build tools like gcc, make, or npm, preventing attackers from compiling and installing exploits. They must enforce read-only root filesystems, making it impossible for an attacker to modify application code or configuration at runtime. They must run as non-root users, limiting the damage a compromised application can do. And they must be digitally signed with cryptographic keys, ensuring that only authorized parties can produce valid production images.
Beyond these image-level requirements, production images are subject to comprehensive verification and documentation requirements. Every production image requires a SLSA L4 attestation document proving exactly how the image was built, who built it, and what inputs were used. Comprehensive Software Bill of Materials documents every software component in the image, enabling rapid vulnerability assessment. VEX (Vulnerability Exploitability) documents provide security context about known vulnerabilities, explaining which ones actually pose a risk in this specific image context. Images must pass comprehensive security scans checking for vulnerabilities, misconfigurations, and policy violations. And every production image must have test coverage above 80%, ensuring that code is well-tested and understood.
Enforcement of these policies happens at multiple technological and organizational levels, creating a robust defense-in-depth approach. At the repository level, pull request checks can be configured to reject any PR that attempts to promote a development image directly to a production registry or namespace. The container registry itself enforces immutability on production images once they are released, preventing any subsequent modification or tampering. Kubernetes-level policy engines (like Kyverno or OPA) can reject unsigned images at deployment time, preventing any unsigned image from running in production namespaces. Audit logging captures every production image deployment, creating an immutable audit trail that documents exactly what was deployed, when, by whom, and with what signature and attestation information. This audit trail enables compliance verification and incident investigation.
Common Gotchas and Mitigation Strategies
Configuration Drift Across Environments
A common problem that teams encounter is configuration divergence between development and production environments. Development configuration might have shorter database connection timeouts suitable for rapid iteration, more verbose logging for debugging, disabled security features to ease development, and feature flags that enable experimental functionality. Production configuration must have different timeout values appropriate for production load, minimal logging to reduce overhead and security exposure, strict security enforcement, and stable feature flags. Without explicit management, these configurations can diverge significantly, causing code to behave correctly in development but fail mysteriously in production. The classic scenario is when a developer tests under development configuration with disabled authentication, and the application breaks in production when authentication is enforced.
The solution to configuration drift is to use a configuration management system like Helm ConfigMap combined with Kustomize to generate both development and production configurations from a single source template. The base configuration defines all common values, and environment-specific overlays apply targeted changes only where needed. This ensures both configurations remain synchronized with a single source of truth, and changes to the base configuration automatically propagate to both environments. Tools like Kustomize make this pattern particularly elegant by allowing declarative overlay composition rather than imperative scripting.
Dependency Version Skew
Another insidious problem is when development and production use different versions of critical tools or dependencies. For example, a developer might be using npm version 8 locally, while the production build system uses npm version 9. These versions build dependencies differently, leading to different node_modules structures and potentially different behavior at runtime. A dependency that works in dev (built with npm v8) might fail in production (installed with npm v9). This is especially problematic for native modules that must be compiled—incompatibilities between the compiler versions and the target platform can lead to subtle runtime failures that only appear in production.
The solution is to use lock files like package-lock.json (for Node.js), poetry.lock (for Python), or Cargo.lock (for Rust) to freeze exact dependency versions. These lock files must be committed to version control and used consistently in both development and production environments. This ensures that every environment installs exactly the same versions of every dependency, eliminating version skew. Beyond lock files, CI/CD pipelines should be configured to validate that lock files are up to date and consistent before running builds.
Secrets in Development Images
A critical security mistake is when developers hardcode API keys, database passwords, or other secrets directly in source code, believing it's acceptable because the development image is temporary and internal. However, development images are frequently shared among team members, stored in container registries, and potentially exposed in logs or debugging sessions. Even if the development image is eventually deleted, snapshots may be cached in registries or backups. Secrets in development images create unnecessary attack surface and violate security best practices.
The solution is to use a secret management system like HashiCorp Vault, Kubernetes Sealed Secrets, or cloud-provider secret management (AWS Secrets Manager, Azure Key Vault, Google Secret Manager) even in development environments. Developers should never commit secrets to version control or include them in container images. Instead, secrets should be injected at runtime from the secret management system, which provides audit trails and fine-grained access control. This approach works consistently in both development and production, establishing a single secure pattern for handling secrets.
"It Works in Dev" Syndrome
The most notorious problem in software development is when code works perfectly in the developer's environment but fails immediately in production. This happens because development environments differ from production in subtle but critical ways: developers use local databases or mocked services while production connects to production data stores with different schemas and performance characteristics, development machines use Windows or macOS while production runs Linux, developers have all development tools available while production has only runtime dependencies, development configuration disables features that production enables. These environmental differences can hide bugs that only manifest under production conditions.
The solution is to deploy a staging environment that is as similar as possible to production and use the actual production image and production configuration in staging. Developers should run integration tests against the production image in the staging environment rather than only testing the development image. This catches environmental incompatibilities before production deployment. The staging environment should use production-equivalent databases, run on the same operating system, enforce the same Kubernetes policies, and enforce the same security constraints. Some teams go so far as to run production images in development using Docker Compose or local Kubernetes clusters to catch these issues even earlier.
Testing Strategy
The testing approach for development and production images must be tailored to their distinct purposes and environments. For development images, testing focuses on rapid feedback and code correctness. Unit tests using frameworks like Jest provide immediate feedback on individual functions and modules, with coverage targets of 80% or higher ensuring reasonable code comprehension. These unit tests run against development configuration, which may include mocked external services and simplified database configurations. Component tests verify that individual modules work correctly in isolation by mocking external services and focusing on the module's logic. The development image environment provides access to source code, symbols, and debugging information, enabling developers to quickly understand and fix failures. These tests are designed to be fast, running in seconds rather than minutes, to support rapid iteration cycles where developers expect near-immediate feedback on their changes.
Production image testing is fundamentally different in scope and rigor. Integration tests run against real, production-equivalent dependencies rather than mocks—production databases with realistic schema and load characteristics, actual caches, and production message queues. These tests use production configuration rather than development configuration, ensuring that the code path exercised in testing matches the code path in production. Importantly, these tests include network latency and timeout scenarios, confirming that the application handles the delays and failures that occur in real production networks. Security scans apply multiple classes of analysis: vulnerability scanning tools identify known CVEs in the image, behavioral analysis tools check for suspicious patterns that might indicate malware, and signature verification confirms that the image comes from a trusted, authenticated source. Performance tests include load testing at production scale (often 1000+ requests per second), memory profiling to ensure the application doesn't leak memory or consume excessive heap space, and startup time verification to confirm the application meets service level agreement (SLA) requirements for initialization. Compatibility tests verify that the image works correctly across different variations: different Kubernetes versions to ensure forward and backward compatibility, various container runtimes (containerd, CRI-O, docker) to ensure portability, and different network policy configurations to ensure the application functions under restrictive security policies.
The Container Catalog
CleanStart maintains separate, distinctly-organized catalogs for development and production images, reflecting their different lifecycles and governance requirements. This separation is enforced at the registry level, making it impossible to confuse development and production artifacts.
Development images are published to a separate repository path: ghcr.io/myorg/myapp/dev. These images receive multiple temporary tags to support various development workflows. The ghcr.io/myorg/myapp/dev:latest tag automatically updates with every commit, allowing developers to quickly access the latest development build. The ghcr.io/myorg/myapp/dev:main tag tracks the main branch, enabling testing of the current main branch state without pulling the absolute latest build. Feature-specific tags like ghcr.io/myorg/myapp/dev:feature-x allow developers to test specific branches or pull requests. Critically, development images are subject to automatic garbage collection—they are automatically deleted after 7 days of retention. This automatic cleanup serves multiple purposes: it keeps the registry clean and prevents unnecessary storage costs, it makes the temporary nature of development images explicit and intentional, and it prevents developers from accidentally deploying stale development images to production.
Production images are published to the main repository path: ghcr.io/myorg/myapp with a more structured and permanent tagging scheme. Each release receives a specific, immutable semantic version tag like ghcr.io/myorg/myapp:1.0.0, representing a specific point in time and specific set of code and dependencies. Convenience tags provide shortcuts to commonly-used versions: ghcr.io/myorg/myapp:1.0 points to the latest patch version in the 1.0 minor version series (for example, 1.0.7), allowing users to consume patch releases automatically without specifying an exact patch version. The tag ghcr.io/myorg/myapp:1 tracks the latest minor version in the 1 major version series, allowing users to consume minor version updates automatically. The ghcr.io/myorg/myapp:latest tag identifies the current stable release. Crucially, production images are retained permanently and treated as immutable once published. They cannot be deleted, and their content cannot be modified. If a vulnerability is discovered in a released image, a new patched version is released; the old version is not patched in place. This immutability ensures that historical versions remain available for emergency rollback if a newer release has issues. It also creates an immutable audit trail documenting every version that has ever been deployed to production, essential for compliance, incident investigation, and understanding the evolution of the system over time.
This registry structure ensures clear separation: development images are temporary and ephemeral, automatically cleaned up, and never suitable for production use. Production images are permanent, immutable, and always available for rollback. There is no confusion between development and production artifacts—they live in different registry paths with different tagging schemes and retention policies.
Philosophy
The separation of dev and prod images reflects a fundamental principle:
The same code compiled in development is not safe for production.
Development is a creative, experimental process. Production is a locked-down, audited, hardened process.
CleanStart enforces this distinction at the image level, making it impossible to accidentally deploy a development image to production.
This governance model is essential for organizations that need to maintain security while supporting developer productivity.
