Knowledge Hub

Multi-Architecture Build Strategy

The Challenge: Supporting Multiple Architectures

Modern computing infrastructure is fundamentally diverse. Organizations today operate across multiple CPU architectures simultaneously: AMD64 (Intel and AMD processors found in most servers and traditional desktops), ARM64 (Apple Silicon chips in modern MacBooks, AWS Graviton processors used by enterprises, and ARM-based Kubernetes clusters), and ARM for embedded systems and IoT devices. This architectural diversity creates a significant problem for container image building.

A Docker image built on an Intel Mac cannot run on an M1 Mac without modification. A binary compiled for the x86-64 instruction set will not execute on an ARM-based processor. Organizations must either build separate images for each architecture or solve the architecture mismatch problem.

The traditional approach is cross-compilation: compiling code on one architecture for execution on a different architecture. An Intel machine can be instructed to compile code that will run on ARM. This approach builds quickly, but it creates a persistent risk: approximately 5-10% of cross-compiled binaries contain subtle, architecture-specific bugs that do not appear until the binary executes on its target architecture.

CleanStart's approach is native compilation: compile on AMD64 machines for AMD64 execution, and compile on ARM64 machines for ARM64 execution. Native compilation is slightly slower because it requires multiple dedicated build machines, but it eliminates the cross-compilation bug risk entirely. Most importantly, a single YAML configuration automatically generates both architecture variants, eliminating the need to maintain separate build specifications.

Why Native Compilation Matters

The following diagram illustrates the multi-architecture build strategy with parallel native compilation:

graph TD    A["Single YAML<br/>Configuration"] -->|Parse| B["Build Plan"]     B -->|Dependency<br/>Resolution<br/>Shared| C["Resolve Dependencies<br/>Python 3.12.1<br/>FastAPI 0.104.1"]     C -->|Lock File| D["Lockfile<br/>All Versions<br/>Pinned"]     D -->|Parallel| E["AMD64 Build<br/>on AMD64 Node"]    D -->|Parallel| F["ARM64 Build<br/>on ARM64 Node"]     E -->|Compile Native| E1["Python 3.12.1<br/>x86_64 Binary"]    E -->|Install| E2["FastAPI<br/>x86_64 Libs"]    E -->|Generate| E3["Image<br/>python-fastapi:amd64<br/>80 MB"]     F -->|Compile Native| F1["Python 3.12.1<br/>aarch64 Binary"]    F -->|Install| F2["FastAPI<br/>aarch64 Libs"]    F -->|Generate| F3["Image<br/>python-fastapi:arm64<br/>80 MB"]     E3 -->|Sign| G["Cosign<br/>Signature"]    F3 -->|Sign| G     G -->|SBOM| H["SBOM AMD64"]    G -->|SBOM| I["SBOM ARM64"]     G -->|Provenance| J["SLSA Level 4"]     E3 -->|Push| K["Registry"]    F3 -->|Push| K     K -->|Multi-Manifest| L["OCI Index"]     L -->|Pull on Intel| M["Docker on Intel<br/>Pulls amd64"]    L -->|Pull on Apple Silicon| N["Docker on M1<br/>Pulls arm64"]     M -->|Run| O["Works Perfectly<br/>Native Binary"]    N -->|Run| P["Works Perfectly<br/>Native Binary"]     style C fill:#99ccff    style D fill:#99ccff    style E fill:#ccffcc    style F fill:#ccffcc    style E3 fill:#99ff99    style F3 fill:#99ff99    style L fill:#ffff99

Cross-Compilation Problems

To understand why native compilation matters, consider the scenario of building a Go binary for ARM64 execution using cross-compilation on an Intel Mac. The developer instructs the Go compiler to target Linux on ARM64:

# Cross-compile (common approach)GOOS=linux GOARCH=arm64 go build -o myapp

This approach appears straightforward but creates multiple points of potential failure. The build tools themselves are Intel-native, but they are instructed to produce ARM64 output. The C runtime linker, which is involved during both compilation and execution, may behave differently on ARM64 than on Intel. Library paths and system library locations vary between architectures. Floating point operations, which are critical for numerical applications, are handled differently by ARM64 CPUs than by Intel CPUs. The result is that approximately 5-10% of cross-compiled binaries contain subtle, architecture-specific bugs that do not manifest during compilation or testing on the build machine but only appear when the binary actually runs on the target ARM64 architecture.

Native Compilation Benefits

Native compilation takes a fundamentally different approach. Rather than asking an Intel machine to produce ARM64 code, the build system compiles the code on an actual ARM64 machine using ARM64-native build tools. When the developer builds Go code on an ARM64 machine, the Go compiler is running natively, the C runtime linker is native to ARM64, all library paths are correct for the ARM64 architecture, and floating point operations are handled by the ARM64 CPU itself.

# Native compile (CleanStart approach)# Build on ARM64 machinego build -o myapp  # Compiles with ARM64-native tools

The benefits are substantial: build tools are native to the target architecture, eliminating tool-related mismatches. The runtime linker is correct for the target platform. All library paths and system integration are architecture-native. There are zero cross-compilation surprises because every aspect of the compilation process is optimized for the target architecture. The result is a 100% architecture-native binary with zero cross-compilation bugs.

CleanStart Multi-Architecture Strategy

Single Configuration, Multiple Builds

# One config filename: python-fastapibase_language: pythonversion: 3.12.1packages:  fastapi: 0.104.1  uvicorn: 0.24.0

Result: Two images generated automatically: python-fastapi:3.12.1-amd64. python-fastapi:3.12.1-arm64.

Parallel Native Builds

The build process is optimized for parallelism.

Phase 1 (Minutes 0-5): Dependency Resolution occurs once and is shared across both architectures. All dependencies (Python 3.12.1, FastAPI 0.104.1, and transitive packages) are resolved and the lockfile is generated.

Phase 2 (Minutes 5-15): Two builds execute in parallel on separate hardware.

AMD64 Build runs on dedicated AMD64 nodes, compiling Python 3.12.1 natively for x86-64 architecture, installing platform-specific wheels for AMD64, and running the 78-test suite. ARM64 Build runs simultaneously on dedicated ARM64 nodes, compiling Python 3.12.1 natively for ARM64 architecture, installing platform-specific wheels for ARM64, and running the same 78-test suite. Phase 3 (Minutes 15-20): Both compiled images are pushed to the registry and a multi-architecture manifest is created that points to both variants.

Efficiency: Total build time is approximately 20 minutes, not 40. Parallel native compilation (not cross-compilation) ensures each architecture gets optimized binaries.

Architecture-Specific Compilation

AMD64 Build Process

The AMD64 build executes on dedicated x86-64 hardware nodes.

Download: Python 3.12.1 source is downloaded (same source as ARM64 build)
Compile: The source is configured and compiled using an AMD64-native GCC toolchain (./configure --prefix=/usr with CC=gcc-amd64 and make -j8)
Package Installation: All dependencies are installed using AMD64-specific wheel binaries (pip install --platform manylinux2014_x86_64)
Verification: The compiled binary is verified to confirm it's a native AMD64 executable (file /usr/bin/python3 shows "ELF 64-bit LSB shared object, x86-64")
Testing: The complete 78-test suite runs against the AMD64 build
Output: The validated image python-fastapi:3.12.1-amd64 is generated with its unique SHA256 hash

ARM64 Build Process

The ARM64 build executes in parallel on dedicated ARM64 hardware nodes.

Download: Python 3.12.1 source is downloaded (same source as AMD64 build)
Compile: The source is configured and compiled using an ARM64-native GCC toolchain (./configure --prefix=/usr with CC=gcc-arm64 and make -j8)
Package Installation: All dependencies are installed using ARM64-specific wheel binaries (pip install --platform manylinux2014_aarch64)
Verification: The compiled binary is verified to confirm it's a native ARM64 executable (file /usr/bin/python3 shows "ELF 64-bit LSB shared object, ARM aarch64")
Testing: The identical 78-test suite runs against the ARM64 build
Output: The validated image python-fastapi:3.12.1-arm64 is generated with its unique SHA256 hash

Multi-Architecture Manifest

After both images are built, CleanStart creates a multi-architecture manifest that ties them together:

{  "schemaVersion": 2,  "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",  "manifests": [    {      "mediaType": "application/vnd.docker.container.image.v1+json",      "size": 107925051,      "digest": "sha256:abc123def456...",  // AMD64 image digest      "platform": {        "architecture": "amd64",        "os": "linux"      }    },    {      "mediaType": "application/vnd.docker.container.image.v1+json",      "size": 112841920,      "digest": "sha256:xyz789uvw012...",  // ARM64 image digest      "platform": {        "architecture": "arm64",        "os": "linux"      }    }  ]}

Usage: Transparent Architecture Detection

# Pull on AMD64 machinedocker pull python-fastapi:3.12.1# Automatically pulls AMD64 variant (python-fastapi:3.12.1-amd64) # Pull on ARM64 machine (Apple Silicon Mac)docker pull python-fastapi:3.12.1# Automatically pulls ARM64 variant (python-fastapi:3.12.1-arm64) # Both commands use same tag; Docker pulls correct variant

Result: Single tag works everywhere. No need to specify python-fastapi:3.12.1-arm64 on ARM64 or python-fastapi:3.12.1-amd64 on AMD64.

Identical Test Coverage

Both AMD64 and ARM64 variants run the exact same 78-test suite:

Architecture-Agnostic Tests

Test Category	Tests	Example
Functionality	30	FastAPI imports, endpoint execution, database connection
Security	25	CVE scanning, FIPS validation, permissions
Performance	15	Boot time, memory usage, CPU efficiency
Compliance	8	License check, SBOM completeness, signature validation

Test Results Comparison

AMD64 Build Test Results: Boot time: 1.2 seconds. Memory baseline: 128MB. FastAPI /docs endpoint: PASS. All 78 tests: PASS. ARM64 Build Test Results: Boot time: 1.3 seconds (5% variation from AMD64, expected due to hardware differences). Memory baseline: 129MB (minimal variance, well within tolerance). FastAPI /docs endpoint: PASS. All 78 tests: PASS. Key point: Both architecture variants pass the identical test suite. Minor variance in absolute performance metrics (5-10%) is expected and acceptable due to underlying hardware differences between AMD64 and ARM64 processors.

Handling Architecture-Specific Code

Platform-Specific Behavior (Rare)

Most code is architecture-agnostic, but occasionally code differs. Consider this Python example with platform-specific dependencies:

name: ml-inferencebase_language: pythonversion: 3.11packages:  # Package works on both AMD64 and ARM64  tensorflow: 2.14.0   # Specify platform-specific versions if needed  numpy:    amd64: 1.24.3      # AMD64-optimized version    arm64: 1.24.3-arm  # ARM64-optimized version

The build result is that the AMD64 image gets numpy 1.24.3 (AMD64-optimized) while the ARM64 image gets numpy 1.24.3-arm (ARM64-optimized).

Runtime Compatibility

The principle is straightforward: if code is 64-bit and doesn't rely on architecture-specific instructions, it works everywhere.

Code like importing json and parsing data works on both AMD64 and ARM64. Similarly, importing hashlib and computing hashes works on both architectures. However, importing ctypes and finding libraries might not work on both because library paths vary by architecture.

CleanStart handles these edge cases automatically via platform-specific compilation flags.

Kubernetes Multi-Architecture Deployment

GKE Node Pools

Kubernetes cluster with multiple architectures:

# Create AMD64 node poolgke node-pools create amd64-pool \  --machine-type n2-standard-4 \  --num-nodes 3 # Create ARM64 node pool (e.g., AWS Graviton)gke node-pools create arm64-pool \  --machine-type t4g.xlarge \  --num-nodes 3 # Create mixed-architecture clustergke clusters create mixed-arch-cluster \  --node-pool amd64-pool \  --node-pool arm64-pool

Pod Scheduling with Multi-Arch Images

# Kubernetes DeploymentapiVersion: apps/v1kind: Deploymentmetadata:  name: python-fastapi-apispec:  replicas: 5  selector:    matchLabels:      app: api-service  template:    metadata:      labels:        app: api-service    spec:      # Kubernetes automatically selects correct variant      containers:      - name: api        image: python-fastapi:3.12.1  # Works on both architectures        resources:          requests:            cpu: 500m            memory: 256Mi          limits:            cpu: 1000m            memory: 512Mi       # Optional: Affinity rules to control placement      affinity:        nodeAffinity:          preferredDuringSchedulingIgnoredDuringExecution:          - weight: 100            preference:              matchExpressions:              - key: kubernetes.io/arch                operator: In                values:                - amd64  # Prefer AMD64 if available

Result: Kubernetes automatically pulls AMD64 variant on AMD64 nodes, ARM64 variant on ARM64 nodes.

Cost Implications

Build Time

Step	Duration	Notes
Dependency resolution	5 min	Shared (once)
AMD64 compilation	10 min	Parallel
ARM64 compilation	10 min	Parallel
Testing (both)	5 min	Parallel
Total	20 min	Not 30 min (parallel)

Vs. cross-compilation: Same 20 minutes, but results are more reliable.

Storage Cost

Per image variant: AMD64 image: ~100MB (compressed). ARM64 image: ~105MB (compressed). Multi-architecture manifest: <1MB. Total per image: ~205MB (both architectures combined)

Portfolio calculation: For a fleet of 50 production images with multiple tags, you store approximately 10GB per tag.

Assessment: Multi-architecture overhead is minimal (<5% additional storage). The modest storage cost is insignificant compared to the operational benefits of supporting both architectures transparently.

Operational Cost

Benefit: Single config generates both variants automatically.

# Before (without multi-arch):# You had to manage:# - python-fastapi-amd64:3.12.1# - python-fastapi-arm64:3.12.1# - Documentation for which to use where# - Deployment scripts checking architecture# Overhead: 30% operational complexity # After (with multi-arch):# Single image to manage:# - python-fastapi:3.12.1 (works everywhere)# Overhead: 0% (automatic)

Troubleshooting Multi-Architecture Builds

Variant Fails Tests on One Architecture

Example: ARM64 build fails performance test

AMD64 Test: Boot time 1.2 seconds ✅ PASSARM64 Test: Boot time 3.5 seconds ❌ FAIL (threshold: 2 seconds)

Solution options:

Adjust test thresholds by architecture: testing: performance_threshold: boot_time: amd64: 2000ms arm64: 4000ms # More lenient for ARM64
Investigate actual issue: # Profile ARM64 build clnstrt-cli debug arm64-build --profile # Might reveal: ARM64 node is slower, or code has ARM64-specific inefficiency
Fix source code: # Example: ARM64-specific optimization import platform if platform.machine() == 'aarch64': use_arm64_optimized_library() else: use_generic_library()

Wheels Not Available for ARM64

Problem: Some Python packages don't have pre-built wheels for ARM64.

Error: No wheel found for package XYZ on ARM64

Solution:

Build from source: packages: xyz-package: install_from: source # pip install from source, compile on ARM64
Use alternative package: packages: xyz-package: amd64: xyz-package==1.0.0 arm64: xyz-package-arm==1.0.0 # Alternative with ARM64 support
Report to maintainer: Request ARM64 wheel from package maintainer.

Performance: AMD64 vs ARM64

Typical Performance Variance

Workload	AMD64	ARM64	Variance
Web server	1000 RPS	950 RPS	-5%
CPU-bound	100 ops/s	95 ops/s	-5%
Memory I/O	1GB/s	1.1GB/s	+10%
Cryptography	100MB/s	110MB/s	+10%

Key insight: ARM64 (especially modern ARM, AWS Graviton, Apple Silicon) is often competitive or faster than AMD64 for most workloads.

Cost Savings: ARM64 (AWS Graviton)

AMD64 (m5.xlarge): $0.192/hour. ARM64 (t4g.xlarge): $0.1344/hour (30% cheaper). 6-month savings (24/7 operation): ~$1,320/instance. For a 50-instance deployment: Total annual savings: $66,000+ (with ARM64).

Multi-Architecture Manifest Commands

Manual Multi-Arch Manifest Creation

# Build single-arch imagesdocker build --platform linux/amd64 -t my-app:amd64 .docker build --platform linux/arm64 -t my-app:arm64 . # Push bothdocker push my-app:amd64docker push my-app:arm64 # Create manifestdocker manifest create my-app:latest \  my-app:amd64 \  my-app:arm64 # Push manifestdocker manifest push my-app:latest

CleanStart Automatic Manifest

# CleanStart does all above automaticallyclnstrt-cli build --config python-fastapi.yaml # Output:# ✅ Built: python-fastapi:3.12.1-amd64# ✅ Built: python-fastapi:3.12.1-arm64# ✅ Created manifest: python-fastapi:3.12.1# ✅ Pushed to registry

Next Steps

Configure images with YAML: YAML Image Configuration. Understand the builder pattern: Builder Pattern Dev-Prod. Learn about hermetic builds: Hermetic Builds and SLSA. Start building: Quick Start.

Key Insight

Multi-architecture support is not an afterthought—it's built into every CleanStart build.

Native compilation on both AMD64 and ARM64, identical testing, transparent manifest selection, and single-tag deployment means your images work everywhere, automatically.

No architecture-specific scripts, no manual variant selection, no cross-compilation bugs. Just docker pull python-fastapi:3.12.1 and get the right architecture, every time.