Difficulty: Beginner | Time: 45 minutes | Focus: Image optimization, layer caching, build patterns
Objectives
By the end of this lab, you will understand why multi-stage builds reduce image size, convert a single-stage Dockerfile into a multi-stage build, measure image size before and after optimization, learn the builder pattern for compiled applications, and understand how Docker layer caching works.
Prerequisites
Required: Docker 18.09 or newer for multi-stage support, a text editor such as VS Code, nano, or vim, the docker images command available, and Bash or a compatible shell. Optional: du or ls -lh for file size inspection.
Verify Docker version:
docker --versionExpected: Docker 18.09 or newer (supports multi-stage builds)
Background: Why Multi-Stage Builds?
A single-stage Dockerfile includes all build tools (compiler, package managers, etc.) in the final image, increasing size. A multi-stage build uses one stage to compile/build, then copies only the artifacts to a final stage, discarding build tools.
Example size reduction: Single-stage Go app: ~800 MB, Multi-stage Go app: ~15 MB, and Reduction: 98%.
In this lab, you'll see a real example with Python.
Step 1: Create Project Directory
mkdir -p ~/labs/lab-02-multi-stage-buildscd ~/labs/lab-02-multi-stage-buildsStep 2: Create a Python Application with Dependencies
Create app.py:
cat > app.py << 'EOF'import requestsimport jsonfrom datetime import datetime def fetch_data(): """Fetch JSON from public API""" try: response = requests.get('https://api.github.com') return response.json() except Exception as e: return {"error": str(e)} def main(): print("=== CleanStart Lab 02 ===") print(f"Timestamp: {datetime.now().isoformat()}") data = fetch_data() print(json.dumps(data, indent=2)) if __name__ == '__main__': main()EOFCreate requirements.txt:
cat > requirements.txt << 'EOF'requests==2.31.0EOFVerify:
ls -laExpected output:
total 24-rw-r--r-- 1 user staff 426 Mar 22 14:45 app.py-rw-r--r-- 1 user staff 22 Mar 22 14:45 requirements.txtStep 3: Create a Single-Stage Dockerfile (Before)
Create Dockerfile.single:
cat > Dockerfile.single << 'EOF'FROM registry.cleanstart.com/cleanstart/python:3.12 WORKDIR /app # Install pip packagesCOPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt # Copy application codeCOPY app.py . # Run the applicationCMD ["python", "app.py"]EOFStep 4: Build the Single-Stage Image
Build the image and capture the layer history:
docker build -f Dockerfile.single -t lab-02-single-stage:latest .Expected output (abbreviated):
[+] Building 18.3s (8/8) FINISHED...[4/6] COPY requirements.txt .[5/6] RUN pip install --no-cache-dir -r requirements.txt[6/6] COPY app.py .=> => naming to docker.io/library/lab-02-single-stage:latestGet the image size:
docker images | grep lab-02-single-stageExpected output (sizes vary, approximate):
lab-02-single-stage latest abcd1234efgh 30 seconds ago 350MBNote: Record this size — approximately 350MB for single-stage.
Step 5: Create a Multi-Stage Dockerfile (After)
Create Dockerfile.multi:
cat > Dockerfile.multi << 'EOF'# Stage 1: BuilderFROM registry.cleanstart.com/cleanstart/python:3.12 AS builder WORKDIR /tmp/build # Install pip packages to a virtual environmentCOPY requirements.txt .RUN pip install --user --no-cache-dir -r requirements.txt # Stage 2: FinalFROM registry.cleanstart.com/cleanstart/python:3.12 WORKDIR /app # Copy only the installed packages from builder (not build tools)COPY --from=builder /root/.local /root/.local # Copy application codeCOPY app.py . # Update PATH to use user-installed packagesENV PATH=/root/.local/bin:$PATH \ PYTHONUNBUFFERED=1 # Run the applicationCMD ["python", "app.py"]EOFKey differences from single-stage: Line 1: FROM ... AS builder — first stage labeled "builder". Line 10: FROM ... AS (no label) — final stage starts fresh. Line 15: COPY --from=builder — copies only built artifacts, not build environment. Lines 19-20: PATH updated to find installed packages.
Step 6: Build the Multi-Stage Image
Build the multi-stage image:
docker build -f Dockerfile.multi -t lab-02-multi-stage:latest .Expected output (abbreviated):
[+] Building 18.5s (12/12) FINISHED...[2/5] WORKDIR /tmp/build[3/5] COPY requirements.txt .[4/5] RUN pip install --user --no-cache-dir -r requirements.txt[5/5] COPY --from=builder /root/.local /root/.local...=> => naming to docker.io/library/lab-02-multi-stage:latestGet the image size:
docker images | grep lab-02Expected output (approximate):
lab-02-multi-stage latest xyz123abcd 5 seconds ago 350MBlab-02-single-stage latest abcd1234efgh 2 minutes ago 350MBStep 7: Compare Image Layers
Examine the layers in the single-stage image:
docker history lab-02-single-stage:latestExpected output (abbreviated):
IMAGE CREATED CREATED BY SIZEabcd1234efgh 2 minutes ago /bin/sh -c pip install --no-cache-dir -r re... 45MBxyz123abcd 2 minutes ago /bin/sh -c #(nop) COPY requirements.txt . 42B...Examine the layers in the multi-stage image:
docker history lab-02-multi-stage:latestExpected output:
IMAGE CREATED CREATED BY SIZEpqr789stuv 5 seconds ago /bin/sh -c #(nop) CMD ["python" "app.py"] 0Bpqr789stuv 5 seconds ago /bin/sh -c #(nop) ENV PATH=/root/.loca... 0Bpqr789stuv 5 seconds ago /bin/sh -c #(nop) COPY --from=builder ... 5MB...Key observation: The multi-stage image's final stage only includes the ~5MB of installed packages, not the entire builder environment.
Step 8: Test Both Images Work Identically
Run the single-stage image:
docker run --rm lab-02-single-stage:latestExpected output:
=== CleanStart Lab 02 ===Timestamp: 2024-03-22T14:45:12.345678{ "current_user_url": "https://api.github.com/user", "authorizations_url": "https://api.github.com/authorizations", ...}Run the multi-stage image:
docker run --rm lab-02-multi-stage:latestExpected output: Identical to single-stage (same functionality, smaller size)
Step 9: Measure and Compare Sizes Precisely
Get detailed image sizes:
docker images --format "table {{.Repository}}:{{.Tag}}\t{{.Size}}" | grep lab-02Expected output:
lab-02-multi-stage:latest 329MBlab-02-single-stage:latest 344MBCalculate the size difference by understanding that both images should be approximately the same size because the Python base image includes most runtime dependencies. In real-world scenarios with compiled languages (Go, Rust, C++), the difference is much larger. The benefit here is cleaner build process (separation of concerns), faster rebuilds (layer caching works better), and potential for swapping builder images (e.g., different Python versions).
Step 10: Demonstrate Layer Caching Benefits
Modify app.py slightly:
cat > app.py << 'EOF'import requestsimport jsonfrom datetime import datetime def fetch_data(): """Fetch JSON from public API""" try: response = requests.get('https://api.github.com') return response.json() except Exception as e: return {"error": str(e)} def main(): print("=== CleanStart Lab 02: Multi-Stage Edition ===") # Changed print(f"Timestamp: {datetime.now().isoformat()}") data = fetch_data() print(json.dumps(data, indent=2)) if __name__ == '__main__': main()EOFRebuild the single-stage image (time this):
time docker build -f Dockerfile.single -t lab-02-single-stage:latest .Expected output (timing line at end):
[+] Building 15.2s (8/8) FINISHED...real 0m15.234suser 0m2.123ssys 0m1.045sRebuild the multi-stage image (time this):
time docker build -f Dockerfile.multi -t lab-02-multi-stage:latest .Expected output:
[+] Building 2.1s (11/11) FINISHED...real 0m2.145suser 0m1.098ssys 0m0.756sKey observation: Multi-stage rebuild is faster (~2 seconds vs. ~15 seconds) because the builder stage cached the pip install layer, only the final stage's COPY and CMD were re-executed, and the pip install layer was reused from cache.
Step 11: Visualize Build Context
Create a comparison summary:
cat > COMPARISON.md << 'EOF'# Single-Stage vs. Multi-Stage Builds ## Single-Stage Dockerfile- All layers in ONE final image- Includes build tools in final image- No layer reuse between builds- Simpler for simple applications ## Multi-Stage Dockerfile- Layers organized in multiple stages- Only final stage artifacts included- Better layer caching on rebuilds- Cleaner separation of concerns- Easier to update build tools without affecting runtime ## Cache Hit Example (After modifying app.py)- Single-stage: Re-runs pip install (cache miss) = ~15 seconds- Multi-stage: Reuses builder pip install (cache hit) = ~2 seconds ## Real-World Impact- Go/Rust/C++: 10-100x size reduction (800MB -> 15-50MB)- Python: 5-20% reduction (caching benefits more significant)- Node.js: 10-50x reduction (dependencies can be large)EOF cat COMPARISON.mdVerification Checklist
Confirm all of the following: Directory ~/labs/lab-02-multi-stage-builds created, app.py contains Python code using requests library, requirements.txt contains requests==2.31.0, Dockerfile.single uses single FROM statement, Dockerfile.multi uses FROM ... AS builder and COPY --from=builder, both images built successfully, docker images shows both images, docker history lab-02-single-stage:latest shows all layers, docker history lab-02-multi-stage:latest shows builder stage is hidden from final image, both images produce identical output when run, multi-stage rebuild is noticeably faster than single-stage rebuild, and COMPARISON.md created with explanation.
If all items are checked, you've successfully completed Lab 02.
What You Learned
You learned how to use multiple FROM statements in multi-stage builds to separate build from runtime concerns. The builder pattern has the first stage building or compiling code while the final stage runs the application. Layer copying with COPY --from=builder pulls artifacts from earlier stages rather than the build environment, enabling size optimization where the final image contains only artifacts and not build tools or intermediate files. Layer caching means when source code changes, cached builder layers are reused, making rebuilds significantly faster. Understanding build context helps you see what ends up in the final image, which improves both security and performance. Finally, you learned the tradeoffs: multi-stage complexity is usually worth it for compiled languages, while for interpreted languages the benefits come more from rebuild speed and layer organization.
Cleanup
Remove images:
docker rmi lab-02-single-stage:latest lab-02-multi-stage:latestRemove lab directory (optional):
rm -rf ~/labs/lab-02-multi-stage-buildsNext Lab
Proceed to Lab 03: Image Verification to learn how to verify container image signatures, inspect SBOMs, and validate SLSA provenance.
Advanced Challenge (Optional)
Create a multi-stage Dockerfile that uses a builder stage with more dependencies, a tester stage that runs tests, and a final runtime stage with minimal dependencies, where only the runtime stage is in the final image.
# Example structure:# Stage 1: builder (installs dependencies)# Stage 2: tester (runs tests on artifacts)# Stage 3: runtime (final image with only artifacts)Estimated Time: 45 minutes | Hands-on: ~35 minutes | Reading: ~10 minutes
