Difficulty: Intermediate | Time: 120 minutes | Focus: AI workloads, multi-stage builds, GPU support, hardening
What You'll Build
This lab walks through building production-ready AI containers using CleanStart base images. You'll build containers for three common AI patterns: inference serving, model training, and agent orchestration—each hardened against the attack vectors that affect standard AI deployments.
By the end, you'll have working examples of an Ollama inference server with shell-less execution and minimal attack surface, a vLLM production deployment with GPU support and Kubernetes integration, a PyTorch training and inference pipeline with separation between stages, and a LangChain agent system with network isolation to prevent unauthorized external communication.
Prerequisites
Required: Docker 20.10 or newer, Kubectl 1.24 or later for Kubernetes labs, Helm 3.x for Helm deployments, curl and jq for testing, and a terminal with bash or zsh.
Optional but recommended: A GPU node or access to cloud GPU (CPU fallback will be shown), Docker registry access to registry.cleanstart.com, and a Kubernetes cluster such as k3s or kind.
Verify your setup:
docker --versionkubectl version --client 2>/dev/null || echo "kubectl not required for Lab 1-2"helm version 2>/dev/null || echo "helm not required for Lab 1-2"curl --versionLab 1: Ollama Inference Server
Time: 25 minutes | Focus: Shell-less containers, multi-stage builds, verification
Scenario
You need a production inference server for running open-source LLMs locally. Standard Ollama containers include a shell, package managers, and debug tools—increasing attack surface. You'll build a hardened version that removes all of these.
Step 1: Create Project Directory
mkdir -p ~/labs/ai-containers/lab-1-ollamacd ~/labs/ai-containers/lab-1-ollamaStep 2: Create Ollama Installation Script
The build stage will compile/download Ollama. Create install-ollama.sh:
cat > install-ollama.sh << 'EOF'#!/bin/bashset -e # Download OllamaOLLAMA_VERSION="0.1.32"OLLAMA_DOWNLOAD_URL="https://github.com/ollama/ollama/releases/download/v${OLLAMA_VERSION}/ollama-linux-x86_64.tar.gz" mkdir -p /build-output/bincd /tmp echo "Downloading Ollama..."curl -fsSL "$OLLAMA_DOWNLOAD_URL" -o ollama.tar.gz echo "Extracting..."tar -xzf ollama.tar.gz -C /tmp/cp /tmp/ollama /build-output/bin/ollama echo "Verifying..."/build-output/bin/ollama --version # Clean uprm -rf /tmp/ollama* /tmp/*.tar.gzEOF chmod +x install-ollama.shStep 3: Create Multi-Stage Dockerfile
Create Dockerfile:
# Stage 1: Build stage - download and prepare Ollama binaryFROM registry.cleanstart.com/cleanstart/python:3.12 AS build WORKDIR /build # Install only curl for downloading binaryRUN apt-get update && apt-get install -y --no-install-recommends \ curl \ tar \ && rm -rf /var/lib/apt/lists/* COPY install-ollama.sh .RUN bash install-ollama.sh # Stage 2: Production stage - minimal runtimeFROM registry.cleanstart.com/cleanstart/base:latest # Create app user (non-root, matches CleanStart uid 65532)USER 65532:65532 # Copy only the Ollama binary from build stageCOPY --from=build --chown=65532:65532 /build-output/bin/ollama /usr/local/bin/ollama # Set working directory (must be writable by user 65532)WORKDIR /app # Expose Ollama API portEXPOSE 11434 # Health checkHEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ CMD ollama list || exit 1 # Run Ollama serverENTRYPOINT ["ollama"]CMD ["serve"]Step 4: Build the Image
docker build -t ollama-secure:latest .Expected output:
Successfully tagged ollama-secure:latestStep 5: Verify Shell-Less Hardening
This is the critical security check—verify the container has no shell:
docker run --rm ollama-secure:latest /bin/sh 2>&1Expected output (must fail):
standard_init_linux.go:228: exec user process caused: no such file or directoryExplanation: CleanStart base images do not include /bin/sh, /bin/bash, or any package managers. This prevents attackers from executing commands even if they compromise the container process.
Step 6: Compare Image Sizes
Compare your hardened image with standard alternatives:
# Check your hardened imagedocker images ollama-secure:latest --format "table {{.Repository}}\t{{.Size}}" # For comparison, check a standard Ubuntu-based Ollama image if availabledocker pull ollama/ollama:latest 2>/dev/null || echo "Standard image not available"docker images ollama/ollama:latest --format "table {{.Repository}}\t{{.Size}}"Expected output (CleanStart version is significantly smaller):
ollama-secure:latest ~180 MBollama/ollama:latest ~800 MB+ (with shell, apt, curl, etc.)Step 7: Run and Test Inference
# Start the server (runs in foreground, Ctrl+C to stop)docker run --rm -p 11434:11434 ollama-secure:latest # In another terminal, test the APIcurl -X POST http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{ "model": "neural-chat", "prompt": "Why is container security important?", "stream": false }'Expected response:
{ "model": "neural-chat", "created_at": "2026-03-22T...", "response": "Container security is important because...", "done": true}Lab 2: vLLM GPU Deployment
Time: 40 minutes | Focus: GPU support, Kubernetes integration, resource management
Scenario
You need a high-performance LLM inference service using vLLM on GPU hardware. vLLM requires specific CUDA versions and optimizations. You'll build a production-ready deployment with proper resource management and Kubernetes integration.
Step 1: Create vLLM Dockerfile
mkdir -p ~/labs/ai-containers/lab-2-vllmcd ~/labs/ai-containers/lab-2-vllm cat > Dockerfile << 'EOF'FROM registry.cleanstart.com/python:3.11-cuda-12.6-prod WORKDIR /app # Install vLLM and dependencies (no package manager, only pip)RUN pip install --no-cache-dir \ vllm==0.4.0 \ uvicorn==0.24.0 \ pydantic==2.5.0 \ numpy==1.26.0 # Copy inference server codeCOPY inference_server.py .COPY router.py .COPY config.yaml . # Non-root user (already set in base image)USER 65532:65532 # GPU resource requirements will be set in Kubernetes manifestEXPOSE 8000 HEALTHCHECK --interval=30s --timeout=10s \ CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1 ENTRYPOINT ["python", "-m", "uvicorn", "inference_server:app", "--host", "0.0.0.0", "--port", "8000"]EOFStep 2: Create Inference Server Code
cat > inference_server.py << 'EOF'from fastapi import FastAPIfrom vllm import LLM, SamplingParamsimport uvicorn app = FastAPI() # Load model (uses GPU automatically)llm = LLM(model="meta-llama/Llama-2-7b-hf") @app.post("/generate")async def generate(prompt: str, max_tokens: int = 256): sampling_params = SamplingParams(temperature=0.7, max_tokens=max_tokens) outputs = llm.generate([prompt], sampling_params) return {"response": outputs[0].outputs[0].text} @app.get("/health")async def health(): return {"status": "healthy"} if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=8000)EOFStep 3: Create Kubernetes Deployment
cat > k8s-deployment.yaml << 'EOF'apiVersion: apps/v1kind: Deploymentmetadata: name: vllm-inferencespec: replicas: 1 selector: matchLabels: app: vllm-inference template: metadata: labels: app: vllm-inference spec: securityContext: runAsNonRoot: true runAsUser: 65532 fsGroup: 65532 containers: - name: vllm image: registry.cleanstart.com/vllm:0.4-prod ports: - containerPort: 8000 resources: requests: memory: "8Gi" cpu: "4" nvidia.com/gpu: "1" limits: memory: "16Gi" cpu: "8" nvidia.com/gpu: "1" securityContext: readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: drop: ["ALL"] volumeMounts: - name: tmp mountPath: /tmp - name: cache mountPath: /var/cache volumes: - name: tmp emptyDir: sizeLimit: 10Gi - name: cache emptyDir: sizeLimit: 20Gi nodeSelector: nvidia.com/gpu: "true"EOFStep 4: Deploy to Kubernetes
kubectl apply -f k8s-deployment.yamlkubectl get pods -l app=vllm-inferencekubectl logs deployment/vllm-inferenceLab 3: PyTorch Training Pipeline
Time: 35 minutes | Focus: Multi-stage builds, training isolation, output verification
Scenario
You run distributed training jobs that require isolation from other cluster workloads. You'll build a PyTorch training container that separates build dependencies from runtime, includes verification of training outputs, and logs metrics for monitoring.
Step 1: Create Training Dockerfile
mkdir -p ~/labs/ai-containers/lab-3-pytorchcd ~/labs/ai-containers/lab-3-pytorch cat > Dockerfile << 'EOF'FROM registry.cleanstart.com/python:3.11-pytorch-2.1-prod WORKDIR /training # Install training dependenciesRUN pip install --no-cache-dir \ torch==2.1.0 \ torchvision==0.16.0 \ tensorboard==2.14.0 \ numpy==1.26.0 # Copy training scriptsCOPY train.py .COPY config.yaml . USER 65532:65532 ENTRYPOINT ["python", "train.py"]EOFStep 2: Create Training Script
cat > train.py << 'EOF'import torchimport torch.nn as nnfrom torch.utils.data import DataLoaderimport jsonfrom datetime import datetime class SimpleModel(nn.Module): def __init__(self): super().__init__() self.fc = nn.Linear(784, 10) def forward(self, x): return self.fc(x.view(x.size(0), -1)) def train(): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = SimpleModel().to(device) optimizer = torch.optim.Adam(model.parameters()) metrics = { "start_time": datetime.now().isoformat(), "device": str(device), "model_params": sum(p.numel() for p in model.parameters()), } print(json.dumps(metrics)) print("Training completed successfully") if __name__ == "__main__": train()EOFStep 3: Run Training Job in Kubernetes
cat > k8s-training-job.yaml << 'EOF'apiVersion: batch/v1kind: Jobmetadata: name: pytorch-trainingspec: template: spec: securityContext: runAsNonRoot: true runAsUser: 65532 containers: - name: trainer image: registry.cleanstart.com/pytorch:2.1-prod volumeMounts: - name: data mountPath: /data - name: output mountPath: /output resources: requests: memory: "16Gi" cpu: "8" limits: memory: "32Gi" cpu: "16" securityContext: readOnlyRootFilesystem: true allowPrivilegeEscalation: false volumes: - name: data emptyDir: {} - name: output emptyDir: {} restartPolicy: NeverEOF kubectl apply -f k8s-training-job.yamlWhat You Learned
Throughout this lab, you explored five essential principles for secure AI containers. Multi-stage builds separate build dependencies from runtime, reducing attack surface and image size significantly. Shell-less execution prevents command injection attacks by removing all shell interpreters. Read-only filesystems protect model weights and application code from modification during runtime. GPU integration demonstrates how CleanStart supports specialized hardware while maintaining security. Resource management ensures training and inference jobs don't consume excessive cluster resources and can be scheduled appropriately.
Cleanup
Remove lab artifacts:
rm -rf ~/labs/ai-containersdocker rmi ollama-secure:latest 2>/dev/null || truekubectl delete job pytorch-training 2>/dev/null || truekubectl delete deployment vllm-inference 2>/dev/null || trueNext Steps
Deploy these containers to your production infrastructure with Helm charts for easier management and scaling. Integrate with your CI/CD pipeline to automatically build and test new model versions. Set up monitoring and alerting for GPU utilization and inference latency. Create NetworkPolicies to isolate AI workloads from other cluster traffic.
Estimated Time: 120 minutes | Hands-on: ~90 minutes | Reading: ~30 minutes
