The Challenge: Supporting Multiple Architectures
Modern computing infrastructure is fundamentally diverse. Organizations today operate across multiple CPU architectures simultaneously: AMD64 (Intel and AMD processors found in most servers and traditional desktops), ARM64 (Apple Silicon chips in modern MacBooks, AWS Graviton processors used by enterprises, and ARM-based Kubernetes clusters), and ARM for embedded systems and IoT devices. This architectural diversity creates a significant problem for container image building.
A Docker image built on an Intel Mac cannot run on an M1 Mac without modification. A binary compiled for the x86-64 instruction set will not execute on an ARM-based processor. Organizations must either build separate images for each architecture or solve the architecture mismatch problem.
The traditional approach is cross-compilation: compiling code on one architecture for execution on a different architecture. An Intel machine can be instructed to compile code that will run on ARM. This approach builds quickly, but it creates a persistent risk: approximately 5-10% of cross-compiled binaries contain subtle, architecture-specific bugs that do not appear until the binary executes on its target architecture.
CleanStart's approach is native compilation: compile on AMD64 machines for AMD64 execution, and compile on ARM64 machines for ARM64 execution. Native compilation is slightly slower because it requires multiple dedicated build machines, but it eliminates the cross-compilation bug risk entirely. Most importantly, a single YAML configuration automatically generates both architecture variants, eliminating the need to maintain separate build specifications.
Why Native Compilation Matters
The following diagram illustrates the multi-architecture build strategy with parallel native compilation:
graph TD A["Single YAML<br/>Configuration"] -->|Parse| B["Build Plan"] B -->|Dependency<br/>Resolution<br/>Shared| C["Resolve Dependencies<br/>Python 3.12.1<br/>FastAPI 0.104.1"] C -->|Lock File| D["Lockfile<br/>All Versions<br/>Pinned"] D -->|Parallel| E["AMD64 Build<br/>on AMD64 Node"] D -->|Parallel| F["ARM64 Build<br/>on ARM64 Node"] E -->|Compile Native| E1["Python 3.12.1<br/>x86_64 Binary"] E -->|Install| E2["FastAPI<br/>x86_64 Libs"] E -->|Generate| E3["Image<br/>python-fastapi:amd64<br/>80 MB"] F -->|Compile Native| F1["Python 3.12.1<br/>aarch64 Binary"] F -->|Install| F2["FastAPI<br/>aarch64 Libs"] F -->|Generate| F3["Image<br/>python-fastapi:arm64<br/>80 MB"] E3 -->|Sign| G["Cosign<br/>Signature"] F3 -->|Sign| G G -->|SBOM| H["SBOM AMD64"] G -->|SBOM| I["SBOM ARM64"] G -->|Provenance| J["SLSA Level 4"] E3 -->|Push| K["Registry"] F3 -->|Push| K K -->|Multi-Manifest| L["OCI Index"] L -->|Pull on Intel| M["Docker on Intel<br/>Pulls amd64"] L -->|Pull on Apple Silicon| N["Docker on M1<br/>Pulls arm64"] M -->|Run| O["Works Perfectly<br/>Native Binary"] N -->|Run| P["Works Perfectly<br/>Native Binary"] style C fill:#99ccff style D fill:#99ccff style E fill:#ccffcc style F fill:#ccffcc style E3 fill:#99ff99 style F3 fill:#99ff99 style L fill:#ffff99Cross-Compilation Problems
To understand why native compilation matters, consider the scenario of building a Go binary for ARM64 execution using cross-compilation on an Intel Mac. The developer instructs the Go compiler to target Linux on ARM64:
# Cross-compile (common approach)GOOS=linux GOARCH=arm64 go build -o myappThis approach appears straightforward but creates multiple points of potential failure. The build tools themselves are Intel-native, but they are instructed to produce ARM64 output. The C runtime linker, which is involved during both compilation and execution, may behave differently on ARM64 than on Intel. Library paths and system library locations vary between architectures. Floating point operations, which are critical for numerical applications, are handled differently by ARM64 CPUs than by Intel CPUs. The result is that approximately 5-10% of cross-compiled binaries contain subtle, architecture-specific bugs that do not manifest during compilation or testing on the build machine but only appear when the binary actually runs on the target ARM64 architecture.
Native Compilation Benefits
Native compilation takes a fundamentally different approach. Rather than asking an Intel machine to produce ARM64 code, the build system compiles the code on an actual ARM64 machine using ARM64-native build tools. When the developer builds Go code on an ARM64 machine, the Go compiler is running natively, the C runtime linker is native to ARM64, all library paths are correct for the ARM64 architecture, and floating point operations are handled by the ARM64 CPU itself.
# Native compile (CleanStart approach)# Build on ARM64 machinego build -o myapp # Compiles with ARM64-native toolsThe benefits are substantial: build tools are native to the target architecture, eliminating tool-related mismatches. The runtime linker is correct for the target platform. All library paths and system integration are architecture-native. There are zero cross-compilation surprises because every aspect of the compilation process is optimized for the target architecture. The result is a 100% architecture-native binary with zero cross-compilation bugs.
CleanStart Multi-Architecture Strategy
Single Configuration, Multiple Builds
# One config filename: python-fastapibase_language: pythonversion: 3.12.1packages: fastapi: 0.104.1 uvicorn: 0.24.0Result: Two images generated automatically: python-fastapi:3.12.1-amd64. python-fastapi:3.12.1-arm64.
Parallel Native Builds
The build process is optimized for parallelism.
Phase 1 (Minutes 0-5): Dependency Resolution occurs once and is shared across both architectures. All dependencies (Python 3.12.1, FastAPI 0.104.1, and transitive packages) are resolved and the lockfile is generated.
Phase 2 (Minutes 5-15): Two builds execute in parallel on separate hardware.
AMD64 Build runs on dedicated AMD64 nodes, compiling Python 3.12.1 natively for x86-64 architecture, installing platform-specific wheels for AMD64, and running the 78-test suite. ARM64 Build runs simultaneously on dedicated ARM64 nodes, compiling Python 3.12.1 natively for ARM64 architecture, installing platform-specific wheels for ARM64, and running the same 78-test suite. Phase 3 (Minutes 15-20): Both compiled images are pushed to the registry and a multi-architecture manifest is created that points to both variants.
Efficiency: Total build time is approximately 20 minutes, not 40. Parallel native compilation (not cross-compilation) ensures each architecture gets optimized binaries.
Architecture-Specific Compilation
AMD64 Build Process
The AMD64 build executes on dedicated x86-64 hardware nodes.
- Download: Python 3.12.1 source is downloaded (same source as ARM64 build)
- Compile: The source is configured and compiled using an AMD64-native GCC toolchain (
./configure --prefix=/usrwithCC=gcc-amd64andmake -j8) - Package Installation: All dependencies are installed using AMD64-specific wheel binaries (
pip install --platform manylinux2014_x86_64) - Verification: The compiled binary is verified to confirm it's a native AMD64 executable (
file /usr/bin/python3shows "ELF 64-bit LSB shared object, x86-64") - Testing: The complete 78-test suite runs against the AMD64 build
- Output: The validated image
python-fastapi:3.12.1-amd64is generated with its unique SHA256 hash
ARM64 Build Process
The ARM64 build executes in parallel on dedicated ARM64 hardware nodes.
- Download: Python 3.12.1 source is downloaded (same source as AMD64 build)
- Compile: The source is configured and compiled using an ARM64-native GCC toolchain (
./configure --prefix=/usrwithCC=gcc-arm64andmake -j8) - Package Installation: All dependencies are installed using ARM64-specific wheel binaries (
pip install --platform manylinux2014_aarch64) - Verification: The compiled binary is verified to confirm it's a native ARM64 executable (
file /usr/bin/python3shows "ELF 64-bit LSB shared object, ARM aarch64") - Testing: The identical 78-test suite runs against the ARM64 build
- Output: The validated image
python-fastapi:3.12.1-arm64is generated with its unique SHA256 hash
Multi-Architecture Manifest
After both images are built, CleanStart creates a multi-architecture manifest that ties them together:
{ "schemaVersion": 2, "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json", "manifests": [ { "mediaType": "application/vnd.docker.container.image.v1+json", "size": 107925051, "digest": "sha256:abc123def456...", // AMD64 image digest "platform": { "architecture": "amd64", "os": "linux" } }, { "mediaType": "application/vnd.docker.container.image.v1+json", "size": 112841920, "digest": "sha256:xyz789uvw012...", // ARM64 image digest "platform": { "architecture": "arm64", "os": "linux" } } ]}Usage: Transparent Architecture Detection
# Pull on AMD64 machinedocker pull python-fastapi:3.12.1# Automatically pulls AMD64 variant (python-fastapi:3.12.1-amd64) # Pull on ARM64 machine (Apple Silicon Mac)docker pull python-fastapi:3.12.1# Automatically pulls ARM64 variant (python-fastapi:3.12.1-arm64) # Both commands use same tag; Docker pulls correct variantResult: Single tag works everywhere. No need to specify python-fastapi:3.12.1-arm64 on ARM64 or python-fastapi:3.12.1-amd64 on AMD64.
Identical Test Coverage
Both AMD64 and ARM64 variants run the exact same 78-test suite:
Architecture-Agnostic Tests
Test Category | Tests | Example |
|---|---|---|
Functionality | 30 | FastAPI imports, endpoint execution, database connection |
Security | 25 | CVE scanning, FIPS validation, permissions |
Performance | 15 | Boot time, memory usage, CPU efficiency |
Compliance | 8 | License check, SBOM completeness, signature validation |
Test Results Comparison
AMD64 Build Test Results: Boot time: 1.2 seconds. Memory baseline: 128MB. FastAPI /docs endpoint: PASS. All 78 tests: PASS. ARM64 Build Test Results: Boot time: 1.3 seconds (5% variation from AMD64, expected due to hardware differences). Memory baseline: 129MB (minimal variance, well within tolerance). FastAPI /docs endpoint: PASS. All 78 tests: PASS. Key point: Both architecture variants pass the identical test suite. Minor variance in absolute performance metrics (5-10%) is expected and acceptable due to underlying hardware differences between AMD64 and ARM64 processors.
Handling Architecture-Specific Code
Platform-Specific Behavior (Rare)
Most code is architecture-agnostic, but occasionally code differs. Consider this Python example with platform-specific dependencies:
name: ml-inferencebase_language: pythonversion: 3.11packages: # Package works on both AMD64 and ARM64 tensorflow: 2.14.0 # Specify platform-specific versions if needed numpy: amd64: 1.24.3 # AMD64-optimized version arm64: 1.24.3-arm # ARM64-optimized versionThe build result is that the AMD64 image gets numpy 1.24.3 (AMD64-optimized) while the ARM64 image gets numpy 1.24.3-arm (ARM64-optimized).
Runtime Compatibility
The principle is straightforward: if code is 64-bit and doesn't rely on architecture-specific instructions, it works everywhere.
Code like importing json and parsing data works on both AMD64 and ARM64. Similarly, importing hashlib and computing hashes works on both architectures. However, importing ctypes and finding libraries might not work on both because library paths vary by architecture.
CleanStart handles these edge cases automatically via platform-specific compilation flags.
Kubernetes Multi-Architecture Deployment
GKE Node Pools
Kubernetes cluster with multiple architectures:
# Create AMD64 node poolgke node-pools create amd64-pool \ --machine-type n2-standard-4 \ --num-nodes 3 # Create ARM64 node pool (e.g., AWS Graviton)gke node-pools create arm64-pool \ --machine-type t4g.xlarge \ --num-nodes 3 # Create mixed-architecture clustergke clusters create mixed-arch-cluster \ --node-pool amd64-pool \ --node-pool arm64-poolPod Scheduling with Multi-Arch Images
# Kubernetes DeploymentapiVersion: apps/v1kind: Deploymentmetadata: name: python-fastapi-apispec: replicas: 5 selector: matchLabels: app: api-service template: metadata: labels: app: api-service spec: # Kubernetes automatically selects correct variant containers: - name: api image: python-fastapi:3.12.1 # Works on both architectures resources: requests: cpu: 500m memory: 256Mi limits: cpu: 1000m memory: 512Mi # Optional: Affinity rules to control placement affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: kubernetes.io/arch operator: In values: - amd64 # Prefer AMD64 if availableResult: Kubernetes automatically pulls AMD64 variant on AMD64 nodes, ARM64 variant on ARM64 nodes.
Cost Implications
Build Time
Step | Duration | Notes |
|---|---|---|
Dependency resolution | 5 min | Shared (once) |
AMD64 compilation | 10 min | Parallel |
ARM64 compilation | 10 min | Parallel |
Testing (both) | 5 min | Parallel |
Total | 20 min | Not 30 min (parallel) |
Vs. cross-compilation: Same 20 minutes, but results are more reliable.
Storage Cost
Per image variant: AMD64 image: ~100MB (compressed). ARM64 image: ~105MB (compressed). Multi-architecture manifest: <1MB. Total per image: ~205MB (both architectures combined)
Portfolio calculation: For a fleet of 50 production images with multiple tags, you store approximately 10GB per tag.
Assessment: Multi-architecture overhead is minimal (<5% additional storage). The modest storage cost is insignificant compared to the operational benefits of supporting both architectures transparently.
Operational Cost
Benefit: Single config generates both variants automatically.
# Before (without multi-arch):# You had to manage:# - python-fastapi-amd64:3.12.1# - python-fastapi-arm64:3.12.1# - Documentation for which to use where# - Deployment scripts checking architecture# Overhead: 30% operational complexity # After (with multi-arch):# Single image to manage:# - python-fastapi:3.12.1 (works everywhere)# Overhead: 0% (automatic)Troubleshooting Multi-Architecture Builds
Variant Fails Tests on One Architecture
Example: ARM64 build fails performance test
AMD64 Test: Boot time 1.2 seconds ✅ PASSARM64 Test: Boot time 3.5 seconds ❌ FAIL (threshold: 2 seconds)Solution options:
- Adjust test thresholds by architecture: testing: performance_threshold: boot_time: amd64: 2000ms arm64: 4000ms # More lenient for ARM64
- Investigate actual issue: # Profile ARM64 build clnstrt-cli debug arm64-build --profile # Might reveal: ARM64 node is slower, or code has ARM64-specific inefficiency
- Fix source code: # Example: ARM64-specific optimization import platform if platform.machine() == 'aarch64': use_arm64_optimized_library() else: use_generic_library()
Wheels Not Available for ARM64
Problem: Some Python packages don't have pre-built wheels for ARM64.
Error: No wheel found for package XYZ on ARM64Solution:
- Build from source: packages: xyz-package: install_from: source # pip install from source, compile on ARM64
- Use alternative package: packages: xyz-package: amd64: xyz-package==1.0.0 arm64: xyz-package-arm==1.0.0 # Alternative with ARM64 support
- Report to maintainer: Request ARM64 wheel from package maintainer.
Performance: AMD64 vs ARM64
Typical Performance Variance
Workload | AMD64 | ARM64 | Variance |
|---|---|---|---|
Web server | 1000 RPS | 950 RPS | -5% |
CPU-bound | 100 ops/s | 95 ops/s | -5% |
Memory I/O | 1GB/s | 1.1GB/s | +10% |
Cryptography | 100MB/s | 110MB/s | +10% |
Key insight: ARM64 (especially modern ARM, AWS Graviton, Apple Silicon) is often competitive or faster than AMD64 for most workloads.
Cost Savings: ARM64 (AWS Graviton)
AMD64 (m5.xlarge): $0.192/hour. ARM64 (t4g.xlarge): $0.1344/hour (30% cheaper). 6-month savings (24/7 operation): ~$1,320/instance. For a 50-instance deployment: Total annual savings: $66,000+ (with ARM64).
Multi-Architecture Manifest Commands
Manual Multi-Arch Manifest Creation
# Build single-arch imagesdocker build --platform linux/amd64 -t my-app:amd64 .docker build --platform linux/arm64 -t my-app:arm64 . # Push bothdocker push my-app:amd64docker push my-app:arm64 # Create manifestdocker manifest create my-app:latest \ my-app:amd64 \ my-app:arm64 # Push manifestdocker manifest push my-app:latestCleanStart Automatic Manifest
# CleanStart does all above automaticallyclnstrt-cli build --config python-fastapi.yaml # Output:# ✅ Built: python-fastapi:3.12.1-amd64# ✅ Built: python-fastapi:3.12.1-arm64# ✅ Created manifest: python-fastapi:3.12.1# ✅ Pushed to registryNext Steps
Configure images with YAML: YAML Image Configuration. Understand the builder pattern: Builder Pattern Dev-Prod. Learn about hermetic builds: Hermetic Builds and SLSA. Start building: Quick Start.
Key Insight
Multi-architecture support is not an afterthought—it's built into every CleanStart build.
Native compilation on both AMD64 and ARM64, identical testing, transparent manifest selection, and single-tag deployment means your images work everywhere, automatically.
No architecture-specific scripts, no manual variant selection, no cross-compilation bugs. Just docker pull python-fastapi:3.12.1 and get the right architecture, every time.
