Knowledge Hub

Regression Testing Strategy for CleanStart Image Upgrades

Purpose

Every time you upgrade a CleanStart base image version—whether a patch, minor, or major version bump—your applications need verification that functionality remains intact. This document provides a systematic, layered regression testing strategy that ensures production readiness while avoiding unnecessary test delays.

Target audience: QA leads, DevOps engineers, platform teams responsible for upgrading container images in production.

Expected testing time: 15 minutes (patch), 45 minutes (minor), 2 hours (major), 24–72 hours (major + soak).

When Regression Testing is Required

Regression testing must run before promoting a new image version to production, with scope determined by upgrade type.

Mandatory Regression Testing Triggers

Trigger	Scope	Timeline
Monthly scheduled patch (e.g., py3.12.1 → py3.12.2)	Smoke + security verification	15 min
Minor version bump (e.g., py3.12 → py3.13, node18 → node20)	Full regression suite	45–60 min
Major version upgrade (e.g., Python 3 → 4, Node 18 → 22)	Full regression + soak test	24–72 hrs
Security hotfix release (emergency CVE patch)	Targeted tests only	20 min
Custom image build (first deployment)	Full regression	60 min
GLIBC-based distro update (Ubuntu 22.04 → 24.04)	Full regression + extended soak	48+ hrs

Skippable Scenarios

Same version, same build date: No testing required (already verified). Dependency update without version change: Smoke tests only. Documentation-only update: No testing required. Third-party library patch (with version lock): Smoke tests only.

Risk Assessment Matrix by Upgrade Type

This matrix guides how much testing to run based on the change risk profile.

Upgrade Type	Risk Level	Regression Scope	Testing Hours	Go/No-Go Criteria
Security patch(py3.12.5 → py3.12.6)	🟢 Low	Smoke + security scan	0.25	Zero new vulns, startup OK
Maintenance patch(py3.12.1 → py3.12.2)	🟢 Low	Smoke + perf baseline	0.5	No perf regression >10%
Minor version bump(py3.12 → py3.13, node18 → node20)	🟡 Medium	Full functional regression	1	All functional tests pass
Major version bump(Python 3 → 4, Node 18 → Node 22)	🔴 High	Full regression + 24–72hr soak	8–16	All metrics within 15% baseline
Distro upgrade(Ubuntu 22.04 → 24.04, Alpine 3.19 → 3.20)	🔴 High	Full regression + extended soak	12–24	Memory leak checks, no stdlib incompatibilities
Emergency hotfix	🟡 Medium	Targeted regression only	1–2	Hotfix issue resolved, no new regressions

Regression Test Suite Structure: Six-Layer Approach

Regression testing proceeds through six automated and manual layers, each with specific objectives and time budgets. Most upgrades stop at Layer 3 or 4; major version changes proceed through Layer 6.

Layer 1: Image Diff Analysis (2 minutes)

Objective: Understand what changed before running any tests.

Before executing tests, analyze the differences between old and new image versions. This prevents surprises and focuses testing on what actually changed.

What to Check

Package manifest comparison Which packages were added, removed, or updated?. Are there critical package changes (different version of OpenSSL, glibc, etc.)?.
SBOM comparison (Software Bill of Materials) Old image SBOM vs. new image SBOM. Any unexpected new dependencies?. Any removed dependencies your app depends on?.
Vulnerability delta Old image security scan results vs. new. Are there fewer vulnerabilities? (expected). Any new vulnerabilities? (red flag—likely a packaging error).
Base layer changes Did the underlying GLIBC version change?. Did OpenSSL or other core library versions update?.

Tools and Commands

Pull and inspect both images:

# Get image digestsOLD_DIGEST=$(docker pull registry.cleanstart.com/py:3.12.1 2>&1 | grep "Digest:" | awk '{print $2}')NEW_DIGEST=$(docker pull registry.cleanstart.com/py:3.12.2 2>&1 | grep "Digest:" | awk '{print $2}') echo "Old: $OLD_DIGEST"echo "New: $NEW_DIGEST"

Extract and compare SBOMs:

# Extract SBOM from image (Cosign + SPDX format)cosign blob-url registry.cleanstart.com/py:3.12.1 sbom > old-sbom.jsoncosign blob-url registry.cleanstart.com/py:3.12.2 sbom > new-sbom.json # Compare package listsjq '.components[] | .name + ":" + .version' old-sbom.json | sort > old-packages.txtjq '.components[] | .name + ":" + .version' new-sbom.json | sort > new-packages.txt diff old-packages.txt new-packages.txt

Scan for vulnerabilities:

# If you use Grype or Trivygrype registry.cleanstart.com/py:3.12.1 --format json > old-vulns.jsongrype registry.cleanstart.com/py:3.12.2 --format json > new-vulns.json # Compare critical/high findingsjq '.matches[] | select(.vulnerability.severity == "Critical" or .vulnerability.severity == "High")' old-vulns.json | wc -ljq '.matches[] | select(.vulnerability.severity == "Critical" or .vulnerability.severity == "High")' new-vulns.json | wc -l

Pass/Fail Criteria

✅ PASS: New image has fewer or equal vulnerabilities; new critical/high findings only expected for intentional major upgrades. ❌ FAIL: New image has new critical/high vulnerabilities not expected. ⚠️ WARN: Unexpected package removals; proceed with caution to Layer 2.

Layer 2: Smoke Tests (5 minutes)

Objective: Verify the container starts, health checks pass, and basic connectivity works.

Smoke tests are the fastest way to catch fatal incompatibilities (missing libraries, broken entrypoints, network issues).

Smoke Test Checklist

[ ] Container starts without errors (docker run). [ ] No segmentation faults or core dumps in startup logs. [ ] Application binaries are present and executable (e.g., python, node, java). [ ] Health check endpoint responds (HTTP 200 or equivalent) within 30 seconds. [ ] Can connect to required external services (database, cache, message queue). [ ] No "library not found" or linker errors in logs. [ ] Application responds to at least one real request.

Smoke Test Script (Generic)

#!/bin/bashset -e IMAGE=$1CONTAINER_NAME="smoke-test-$(date +%s)" echo "🧪 Starting smoke test for $IMAGE..." # Start containerdocker run -d \  --name "$CONTAINER_NAME" \  --health-cmd='curl -f http://localhost:8080/health || exit 1' \  --health-interval=5s \  --health-timeout=3s \  --health-retries=3 \  -p 8080:8080 \  "$IMAGE" & CONTAINER_ID=$!sleep 2 # Check startup logsif docker logs "$CONTAINER_NAME" | grep -i "error\|segfault\|panic"; then  echo "❌ FAIL: Errors in startup logs"  docker logs "$CONTAINER_NAME"  docker rm -f "$CONTAINER_NAME"  exit 1fi # Wait for health checkfor i in {1..30}; do  if docker inspect "$CONTAINER_NAME" | grep '"Status": "healthy"' > /dev/null; then    echo "✅ Health check passed"    break  fi  if [ $i -eq 30 ]; then    echo "❌ FAIL: Health check timeout"    docker logs "$CONTAINER_NAME"    docker rm -f "$CONTAINER_NAME"    exit 1  fi  sleep 1done # Test basic requestRESPONSE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health)if [ "$RESPONSE" != "200" ]; then  echo "❌ FAIL: Health endpoint returned $RESPONSE"  docker logs "$CONTAINER_NAME"  docker rm -f "$CONTAINER_NAME"  exit 1fi echo "✅ Smoke test PASSED"docker rm -f "$CONTAINER_NAME"exit 0

Pass/Fail Criteria

✅ PASS: Container starts, health check passes, no fatal errors. ❌ FAIL: Any startup error, health check timeout, missing core binaries. Action on FAIL: Stop testing, report to image maintainer.

Layer 3: Functional Regression Suite (15–30 minutes)

Objective: Run the full application test suite against the new image.

This is your main regression test battery. Execute all automated tests that validate application behavior.

What Tests to Include

API endpoint tests (all CRUD operations)
Database integration tests (query performance, connection pooling)
Authentication and authorization tests
Data pipeline tests (ETL, streaming, batch processing)
File I/O tests (read, write, permissions)
Cache integration tests (Redis, Memcached)
Scheduled job execution tests
Error handling tests (expected exceptions, graceful degradation)
Concurrent request tests (thread safety, race conditions)
Backward compatibility tests (old data format support)

Execution Pattern

#!/bin/bashset -e NEW_IMAGE=$1TEST_CONTAINER="test-$(date +%s)" # Start container with test-friendly environmentdocker run -d \  --name "$TEST_CONTAINER" \  -e "NODE_ENV=test" \  -e "LOG_LEVEL=debug" \  -p 8080:8080 \  --network test-network \  "$NEW_IMAGE" & sleep 5  # Wait for startup # Run test suitedocker exec "$TEST_CONTAINER" \  /bin/sh -c "cd /app && npm test -- --coverage --reporter=json > /tmp/test-results.json" # Copy resultsdocker cp "$TEST_CONTAINER":/tmp/test-results.json ./test-results-new.json # Parse resultsPASS=$(jq '.numPassedTests' test-results-new.json)FAIL=$(jq '.numFailedTests' test-results-new.json)SKIPPED=$(jq '.numPendingTests' test-results-new.json) echo "Test Results: $PASS passed, $FAIL failed, $SKIPPED skipped" # Cleanupdocker rm -f "$TEST_CONTAINER" if [ "$FAIL" -gt 0 ]; then  echo "❌ FAIL: $FAIL test(s) failed"  jq '.testResults[]' test-results-new.json  exit 1fi echo "✅ Functional regression PASSED"exit 0

Pass/Fail Criteria

✅ PASS: >95% of tests pass; failures are pre-known/skipped for upgrade. ⚠️ WARN: 90–95% pass rate; investigate failures before proceeding. ❌ FAIL: <90% pass rate or new failures not explained. Action on WARN/FAIL: Review failed test logs, assess risk, consider rollback to previous version.

Layer 4: Performance Regression Baseline (30 minutes)

Objective: Verify performance metrics haven't degraded by >10%.

Run the same workload under both old and new images, comparing startup time, memory usage, request latency, and throughput.

Metrics to Collect

Metric	Tool	Threshold	Action if Exceeded
Startup time (cold start)	`docker run` + `time`	<10% increase	Investigate initialization
Memory footprint (RSS)	`docker stats`	<5% increase	Check for memory leaks
Request latency (p95)	`wrk` / `hey`	<10% increase	Profile hotspots
Throughput (RPS)	`wrk` / `k6`	<10% decrease	Check for CPU/syscall overhead
Garbage collection pause (if applicable)	`pprof` / `jps`	<5% increase	Heap size may need tuning

Performance Benchmark Script

#!/bin/bash OLD_IMAGE=$1NEW_IMAGE=$2 echo "📊 Performance baseline comparison: $OLD_IMAGE → $NEW_IMAGE" # Helper function to run benchmarkrun_benchmark() {  local IMAGE=$1  local LABEL=$2  local CONTAINER="perf-test-$(date +%s)"   echo "Starting $LABEL..."  docker run -d \    --name "$CONTAINER" \    -p 8080:8080 \    --cpus=2 \    --memory=1g \    "$IMAGE" &   sleep 10  # Warmup   # Measure startup time  START=$(date +%s%N)  curl -s http://localhost:8080/health > /dev/null  END=$(date +%s%N)  STARTUP_TIME=$(( (END - START) / 1000000 ))  # ms   # Run load test (30 sec @ 100 RPS)  wrk -t4 -c100 -d30s -R100 \    --script /tmp/latency.lua \    http://localhost:8080/ \    > /tmp/perf-${LABEL}.txt   # Collect memory stats  PEAK_MEMORY=$(docker stats --no-stream "$CONTAINER" | tail -1 | awk '{print $4}')   # Cleanup  docker stop "$CONTAINER"  docker rm "$CONTAINER"   echo "$LABEL: startup=${STARTUP_TIME}ms, memory=$PEAK_MEMORY"  return 0} # Run benchmarksrun_benchmark "$OLD_IMAGE" "OLD"OLD_STARTUP=$(grep "startup=" /tmp/perf-OLD.txt | awk -F= '{print $2}' | awk '{print $1}')OLD_MEMORY=$(grep "memory=" /tmp/perf-OLD.txt | awk -F= '{print $2}' | awk '{print $1}') run_benchmark "$NEW_IMAGE" "NEW"NEW_STARTUP=$(grep "startup=" /tmp/perf-NEW.txt | awk -F= '{print $2}' | awk '{print $1}')NEW_MEMORY=$(grep "memory=" /tmp/perf-NEW.txt | awk -F= '{print $2}' | awk '{print $1}') # CompareSTARTUP_DELTA=$(( (NEW_STARTUP - OLD_STARTUP) * 100 / OLD_STARTUP ))MEMORY_DELTA=$(( (NEW_MEMORY - OLD_MEMORY) * 100 / OLD_MEMORY )) echo "📈 Results:"echo "   Startup: ${OLD_STARTUP}ms → ${NEW_STARTUP}ms (${STARTUP_DELTA:+${STARTUP_DELTA}>0?+:}-${STARTUP_DELTA#-}%)"echo "   Memory:  ${OLD_MEMORY} → ${NEW_MEMORY} (${MEMORY_DELTA:+${MEMORY_DELTA}>0?+:}-${MEMORY_DELTA#-}%)" if [ "$STARTUP_DELTA" -gt 10 ] || [ "$MEMORY_DELTA" -gt 10 ]; then  echo "⚠️  WARN: Metrics degraded >10%"  exit 1fi echo "✅ Performance regression PASSED"exit 0

Pass/Fail Criteria

✅ PASS: All metrics within thresholds. ⚠️ WARN: 1–2 metrics exceed threshold by <20%; proceed with caution. ❌ FAIL: >2 metrics exceed thresholds or >20% degradation. Action on WARN: Document regression, proceed if acceptable; monitor in production. Action on FAIL: Investigate root cause; consider version downgrade.

Layer 5: Security Regression Verification (10 minutes)

Objective: Confirm hardening features (shell-less, read-only FS, non-root) are intact.

CleanStart images ship with security features enabled by default. This layer verifies they haven't been accidentally disabled or bypassed in the new version.

Security Hardening Checklist

#!/bin/bash IMAGE=$1CONTAINER="security-test-$(date +%s)" echo "🔒 Security hardening verification for $IMAGE" # Start containerdocker run -d \  --name "$CONTAINER" \  "$IMAGE" & sleep 3 # 1. Verify non-root userecho -n "Checking non-root user... "if docker exec "$CONTAINER" id | grep -q "uid=65532"; then  echo "✅"else  echo "❌ FAIL: Not running as UID 65532"  docker rm -f "$CONTAINER"  exit 1fi # 2. Verify no shellecho -n "Checking shell-less mode... "if docker exec "$CONTAINER" test ! -f /bin/sh > /dev/null 2>&1; then  echo "✅"else  echo "⚠️  WARN: Shell found (not required for all images)"fi # 3. Verify read-only root filesystemecho -n "Checking read-only root FS... "if docker run --read-only "$IMAGE" /app/health-check > /dev/null 2>&1; then  echo "✅"else  echo "❌ FAIL: Application requires writable root FS (unexpected)"  docker rm -f "$CONTAINER"  exit 1fi # 4. Verify signatureecho -n "Verifying Cosign signature... "if cosign verify --certificate-identity-regexp='^https://github.com/cleanstart' \   --certificate-oidc-issuer=https://token.actions.githubusercontent.com \   "$IMAGE" > /dev/null 2>&1; then  echo "✅"else  echo "⚠️  WARN: Signature verification failed (check certificate)"fi # 5. Verify SBOM presentecho -n "Verifying SBOM present... "if cosign blob-url "$IMAGE" sbom | jq .components > /dev/null 2>&1; then  echo "✅"else  echo "❌ FAIL: SBOM not found"  docker rm -f "$CONTAINER"  exit 1fi # Cleanupdocker rm -f "$CONTAINER" echo "✅ Security hardening PASSED"exit 0

Pass/Fail Criteria

✅ PASS: All hardening features verified. ⚠️ WARN: Signature verification failed (check certificate chain). ❌ FAIL: Non-root user missing, read-only FS fails, SBOM missing. Action on FAIL: Do not promote to production; escalate to security team.

Layer 6: Soak Testing (24–72 hours)

Objective: Detect subtle issues (memory leaks, thread leaks, connection pool exhaustion) that only appear under sustained load.

Soak testing is mandatory for major version upgrades and distro changes. Run the application under continuous load for 24–72 hours, monitoring for degradation.

Soak Test Setup

#!/bin/bash IMAGE=$1DURATION_HOURS=${2:-24}INTERVAL_SECONDS=60 CONTAINER="soak-test-$(date +%s)" echo "🔄 Starting ${DURATION_HOURS}-hour soak test for $IMAGE" # Start containerdocker run -d \  --name "$CONTAINER" \  -e "SOAK_TEST=true" \  --cpus=2 \  --memory=2g \  -p 8080:8080 \  "$IMAGE" & sleep 10 # Monitor loopELAPSED=0MAX_SECONDS=$(( DURATION_HOURS * 3600 ))ITERATION=0 while [ $ELAPSED -lt $MAX_SECONDS ]; do  ITERATION=$(( ITERATION + 1 ))  ELAPSED=$(( ITERATION * INTERVAL_SECONDS ))  HOURS_ELAPSED=$(( ELAPSED / 3600 ))   # Collect metrics  STATS=$(docker stats --no-stream "$CONTAINER" | tail -1)  MEMORY=$(echo "$STATS" | awk '{print $4}')  CPU=$(echo "$STATS" | awk '{print $3}')   # Check application health  HEALTH=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health)   # Check for memory growth (should stabilize after warmup)  if [ "$ITERATION" -eq 1 ]; then    BASELINE_MEMORY="$MEMORY"    echo "Baseline memory: $BASELINE_MEMORY"  else    # Extract numeric value (e.g., "256MiB" → "256")    BASELINE_MB=$(echo "$BASELINE_MEMORY" | sed 's/[^0-9]*//g')    CURRENT_MB=$(echo "$MEMORY" | sed 's/[^0-9]*//g')    GROWTH=$(( (CURRENT_MB - BASELINE_MB) * 100 / BASELINE_MB ))     # Alert if memory grows >30% (likely leak)    if [ "$GROWTH" -gt 30 ]; then      echo "⚠️  Memory leak suspected: grew ${GROWTH}% (${BASELINE_MB}MB → ${CURRENT_MB}MB)"    fi  fi   # Log progress  printf "[%2dh:%02dm] Health: $HEALTH | Memory: $MEMORY | CPU: $CPU\n" \    $(( HOURS_ELAPSED )) $(( (ELAPSED % 3600) / 60 ))   # Check health endpoint  if [ "$HEALTH" != "200" ]; then    echo "❌ FAIL: Health check returned $HEALTH"    docker logs "$CONTAINER" | tail -20    docker rm -f "$CONTAINER"    exit 1  fi   sleep "$INTERVAL_SECONDS"done # Final diagnosticsecho ""echo "📊 Soak test complete. Final diagnostics:"docker logs "$CONTAINER" | grep -i "error\|warning" | tail -20 # Cleanupdocker rm -f "$CONTAINER" echo "✅ Soak test PASSED"exit 0

Soak Test Metrics to Monitor

Metric	Check Frequency	Threshold	Action
Memory (RSS)	Every 60s	<30% growth from baseline	Stop test if >50% growth
CPU utilization	Every 60s	<80% sustained	Reduce load if exceeding
File descriptor count	Every 300s	<1024 open	Stop test if approaching limit
Connection pool size	Every 300s	Stable/bounded	Investigate if growing unbounded
Health check success rate	Every 60s	>99%	Stop test on failure
Application errors	Every 60s	<1 error/minute	Escalate if error rate increasing

Pass/Fail Criteria

✅ PASS: All metrics stable, no memory leaks, 100% health check success. ⚠️ WARN: Memory growth <20%, occasional (1–2) health check failures, recoverable. ❌ FAIL: Memory growth >30%, sustained health check failures, connection leaks. Action on FAIL: Stop test, review logs, consider rolling back to previous version.

Automation Examples: CI/CD Integration

GitHub Actions Workflow

name: Image Regression Test on:  workflow_dispatch:    inputs:      old_image:        description: 'Previous image tag (for comparison)'        required: true        default: 'registry.cleanstart.com/py:3.12.1'      new_image:        description: 'New image tag to test'        required: true        default: 'registry.cleanstart.com/py:3.12.2'      soak_hours:        description: 'Soak test duration (0 = skip)'        required: false        default: '0' jobs:  regression-test:    runs-on: ubuntu-latest    timeout-minutes: 120    steps:      - name: Checkout tests        uses: actions/checkout@v4       - name: Layer 1 - Diff Analysis        run: |          ./scripts/layer1-diff-analysis.sh "${{ inputs.new_image }}"       - name: Layer 2 - Smoke Tests        run: |          ./scripts/layer2-smoke-tests.sh "${{ inputs.new_image }}"       - name: Layer 3 - Functional Regression        run: |          npm test -- --coverage        env:          TEST_IMAGE: "${{ inputs.new_image }}"       - name: Layer 4 - Performance Baseline        run: |          ./scripts/layer4-perf-baseline.sh "${{ inputs.old_image }}" "${{ inputs.new_image }}"       - name: Layer 5 - Security Hardening        run: |          ./scripts/layer5-security-check.sh "${{ inputs.new_image }}"       - name: Layer 6 - Soak Test (if requested)        if: ${{ inputs.soak_hours != '0' }}        run: |          ./scripts/layer6-soak-test.sh "${{ inputs.new_image }}" "${{ inputs.soak_hours }}"        timeout-minutes: ${{ inputs.soak_hours * 60 + 30 }}       - name: Upload results        if: always()        uses: actions/upload-artifact@v4        with:          name: regression-results          path: ./test-results/       - name: Comment on PR        if: always()        uses: actions/github-script@v7        with:          script: |            const fs = require('fs');            const results = JSON.parse(fs.readFileSync('./test-results/summary.json'));            github.rest.issues.createComment({              issue_number: context.issue.number,              owner: context.repo.owner,              repo: context.repo.repo,              body: `## Image Regression Test Results\n\n${results.summary}`            });

GitLab CI Pipeline

image: docker:latest stages:  - diff-analysis  - smoke-test  - functional-test  - performance-test  - security-test  - soak-test variables:  OLD_IMAGE: ${OLD_IMAGE:-registry.cleanstart.com/py:3.12.1}  NEW_IMAGE: ${NEW_IMAGE:-registry.cleanstart.com/py:3.12.2}  SOAK_HOURS: ${SOAK_HOURS:-0} layer1-diff:  stage: diff-analysis  script:    - ./scripts/layer1-diff-analysis.sh $NEW_IMAGE  artifacts:    reports:      junit: results/diff-report.xml layer2-smoke:  stage: smoke-test  services:    - docker:dind  script:    - ./scripts/layer2-smoke-tests.sh $NEW_IMAGE  artifacts:    reports:      junit: results/smoke-report.xml layer3-functional:  stage: functional-test  services:    - docker:dind  script:    - docker run $NEW_IMAGE npm test -- --reporter=junit --outputFile=results/functional.xml  artifacts:    reports:      junit: results/functional.xml layer4-perf:  stage: performance-test  services:    - docker:dind  script:    - ./scripts/layer4-perf-baseline.sh $OLD_IMAGE $NEW_IMAGE  artifacts:    paths:      - results/perf-*.json layer5-security:  stage: security-test  script:    - ./scripts/layer5-security-check.sh $NEW_IMAGE  artifacts:    reports:      junit: results/security-report.xml layer6-soak:  stage: soak-test  services:    - docker:dind  script:    - ./scripts/layer6-soak-test.sh $NEW_IMAGE $SOAK_HOURS  artifacts:    paths:      - results/soak-*.json  only:    - schedules  timeout: 72h

Go/No-Go Decision Matrix

Use this matrix to make promotion decisions at each testing stage.

Single-Stage Decision Matrix

Layer	✅ GO	⚠️ PROCEED WITH CAUTION	❌ NO-GO
1: Diff	Fewer vulns	Same vulns	More critical/high vulns
2: Smoke	All pass	1 timeout	Any hard errors
3: Functional	>99% pass	95-99% pass (known failures)	<95% pass
4: Perf	<5% regression	5-10% regression	>10% regression
5: Security	All pass	Signature warn	Non-root/SBOM fail
6: Soak	Stable	<20% memory growth	>30% growth or health failures

Full Promotion Criteria

For patch upgrades (3.12.1 → 3.12.2): ✅ GO if: Layer 2 (smoke) + Layer 5 (security) both pass.

For minor upgrades (3.12 → 3.13): ✅ GO if: Layers 2 + 3 + 4 + 5 all pass.

For major upgrades (Python 3 → 4): ✅ GO if: Layers 2 + 3 + 4 + 5 + 6 all pass, soak ≥24 hours.

For hotfix releases: ✅ GO if: Layer 2 (smoke) + 5 (security) pass + specific hotfix issue resolved.

Rollback Procedure

If regression testing fails at any stage, use this procedure to restore the previous image version.

Immediate Rollback (Production)

# 1. Identify running containers with new imagedocker ps -a | grep "new-image-tag" # 2. Redeploy previous image versionkubectl set image deployment/myapp \  app=registry.cleanstart.com/py:3.12.1@sha256:abc123... # 3. Monitor rolloutkubectl rollout status deployment/myapp --timeout=5m # 4. Verify traffic is routing correctlycurl https://api.example.com/health # 5. Capture logs for diagnosticskubectl logs -l app=myapp --tail=1000 > rollback-logs.txt

Root Cause Analysis (Post-Rollback)

Review test logs: What layer failed and why?
Check image metadata: Did SBOM/signature change unexpectedly?
Inspect error patterns: Is it a library incompatibility or application bug?
Contact image maintainer: If the new image is the issue, file a bug report with test results
Document findings: Update your regression test suite to catch this failure in the future

Prevention for Future Upgrades

# Add custom regression test for this specific issuecat > tests/regression-py312-issue.test.js << 'EOF'describe('Python 3.12 specific regression', () => {  it('should handle X without error', async () => {    // Test that specifically covers the failed scenario  });});EOF # Ensure test runs as part of Layer 3 functional suitegit add tests/regression-py312-issue.test.jsgit commit -m "Add regression test for Python 3.12 issue"

What to Read Next

Performance Baseline Testing: Detailed guide for establishing performance metrics. Security Hardening Reference: Deep-dive into shell-less, read-only, non-root architecture. Image Upgrade Checklist: Step-by-step promotion from staging to production. Troubleshooting: Regression Test Failures: Solutions for common test failures.