Lab: Building Secure AI Containers with CleanStart

Knowledge Hub

Difficulty: Intermediate | Time: 120 minutes | Focus: AI workloads, multi-stage builds, GPU support, hardening

What You'll Build

This lab walks through building production-ready AI containers using CleanStart base images. You'll build containers for three common AI patterns: inference serving, model training, and agent orchestration—each hardened against the attack vectors that affect standard AI deployments.

By the end, you'll have working examples of an Ollama inference server with shell-less execution and minimal attack surface, a vLLM production deployment with GPU support and Kubernetes integration, a PyTorch training and inference pipeline with separation between stages, and a LangChain agent system with network isolation to prevent unauthorized external communication.

Prerequisites

Required: Docker 20.10 or newer, Kubectl 1.24 or later for Kubernetes labs, Helm 3.x for Helm deployments, curl and jq for testing, and a terminal with bash or zsh.

Optional but recommended: A GPU node or access to cloud GPU (CPU fallback will be shown), Docker registry access to registry.cleanstart.com, and a Kubernetes cluster such as k3s or kind.

Verify your setup:

docker --versionkubectl version --client 2>/dev/null || echo "kubectl not required for Lab 1-2"helm version 2>/dev/null || echo "helm not required for Lab 1-2"curl --version

Lab 1: Ollama Inference Server

Time: 25 minutes | Focus: Shell-less containers, multi-stage builds, verification

Scenario

You need a production inference server for running open-source LLMs locally. Standard Ollama containers include a shell, package managers, and debug tools—increasing attack surface. You'll build a hardened version that removes all of these.

Step 1: Create Project Directory

mkdir -p ~/labs/ai-containers/lab-1-ollamacd ~/labs/ai-containers/lab-1-ollama

Step 2: Create Ollama Installation Script

The build stage will compile/download Ollama. Create install-ollama.sh:

cat > install-ollama.sh << 'EOF'#!/bin/bashset -e # Download OllamaOLLAMA_VERSION="0.1.32"OLLAMA_DOWNLOAD_URL="https://github.com/ollama/ollama/releases/download/v${OLLAMA_VERSION}/ollama-linux-x86_64.tar.gz" mkdir -p /build-output/bincd /tmp echo "Downloading Ollama..."curl -fsSL "$OLLAMA_DOWNLOAD_URL" -o ollama.tar.gz echo "Extracting..."tar -xzf ollama.tar.gz -C /tmp/cp /tmp/ollama /build-output/bin/ollama echo "Verifying..."/build-output/bin/ollama --version # Clean uprm -rf /tmp/ollama* /tmp/*.tar.gzEOF chmod +x install-ollama.sh

Step 3: Create Multi-Stage Dockerfile

Create Dockerfile:

# Stage 1: Build stage - download and prepare Ollama binaryFROM registry.cleanstart.com/cleanstart/python:3.12 AS build WORKDIR /build # Install only curl for downloading binaryRUN apt-get update && apt-get install -y --no-install-recommends \    curl \    tar \    && rm -rf /var/lib/apt/lists/* COPY install-ollama.sh .RUN bash install-ollama.sh # Stage 2: Production stage - minimal runtimeFROM registry.cleanstart.com/cleanstart/base:latest # Create app user (non-root, matches CleanStart uid 65532)USER 65532:65532 # Copy only the Ollama binary from build stageCOPY --from=build --chown=65532:65532 /build-output/bin/ollama /usr/local/bin/ollama # Set working directory (must be writable by user 65532)WORKDIR /app # Expose Ollama API portEXPOSE 11434 # Health checkHEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \  CMD ollama list || exit 1 # Run Ollama serverENTRYPOINT ["ollama"]CMD ["serve"]

Step 4: Build the Image

docker build -t ollama-secure:latest .

Expected output:

Successfully tagged ollama-secure:latest

Step 5: Verify Shell-Less Hardening

This is the critical security check—verify the container has no shell:

docker run --rm ollama-secure:latest /bin/sh 2>&1

Expected output (must fail):

standard_init_linux.go:228: exec user process caused: no such file or directory

Explanation: CleanStart base images do not include /bin/sh, /bin/bash, or any package managers. This prevents attackers from executing commands even if they compromise the container process.

Step 6: Compare Image Sizes

Compare your hardened image with standard alternatives:

# Check your hardened imagedocker images ollama-secure:latest --format "table {{.Repository}}\t{{.Size}}" # For comparison, check a standard Ubuntu-based Ollama image if availabledocker pull ollama/ollama:latest 2>/dev/null || echo "Standard image not available"docker images ollama/ollama:latest --format "table {{.Repository}}\t{{.Size}}"

Expected output (CleanStart version is significantly smaller):

ollama-secure:latest       ~180 MBollama/ollama:latest       ~800 MB+ (with shell, apt, curl, etc.)

Step 7: Run and Test Inference

# Start the server (runs in foreground, Ctrl+C to stop)docker run --rm -p 11434:11434 ollama-secure:latest # In another terminal, test the APIcurl -X POST http://localhost:11434/api/generate \  -H "Content-Type: application/json" \  -d '{    "model": "neural-chat",    "prompt": "Why is container security important?",    "stream": false  }'

Expected response:

{  "model": "neural-chat",  "created_at": "2026-03-22T...",  "response": "Container security is important because...",  "done": true}

Lab 2: vLLM GPU Deployment

Time: 40 minutes | Focus: GPU support, Kubernetes integration, resource management

Scenario

You need a high-performance LLM inference service using vLLM on GPU hardware. vLLM requires specific CUDA versions and optimizations. You'll build a production-ready deployment with proper resource management and Kubernetes integration.

Step 1: Create vLLM Dockerfile

mkdir -p ~/labs/ai-containers/lab-2-vllmcd ~/labs/ai-containers/lab-2-vllm cat > Dockerfile << 'EOF'FROM registry.cleanstart.com/python:3.11-cuda-12.6-prod WORKDIR /app # Install vLLM and dependencies (no package manager, only pip)RUN pip install --no-cache-dir \    vllm==0.4.0 \    uvicorn==0.24.0 \    pydantic==2.5.0 \    numpy==1.26.0 # Copy inference server codeCOPY inference_server.py .COPY router.py .COPY config.yaml . # Non-root user (already set in base image)USER 65532:65532 # GPU resource requirements will be set in Kubernetes manifestEXPOSE 8000 HEALTHCHECK --interval=30s --timeout=10s \  CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1 ENTRYPOINT ["python", "-m", "uvicorn", "inference_server:app", "--host", "0.0.0.0", "--port", "8000"]EOF

Step 2: Create Inference Server Code

cat > inference_server.py << 'EOF'from fastapi import FastAPIfrom vllm import LLM, SamplingParamsimport uvicorn app = FastAPI() # Load model (uses GPU automatically)llm = LLM(model="meta-llama/Llama-2-7b-hf") @app.post("/generate")async def generate(prompt: str, max_tokens: int = 256):    sampling_params = SamplingParams(temperature=0.7, max_tokens=max_tokens)    outputs = llm.generate([prompt], sampling_params)    return {"response": outputs[0].outputs[0].text} @app.get("/health")async def health():    return {"status": "healthy"} if __name__ == "__main__":    uvicorn.run(app, host="0.0.0.0", port=8000)EOF

Step 3: Create Kubernetes Deployment

cat > k8s-deployment.yaml << 'EOF'apiVersion: apps/v1kind: Deploymentmetadata:  name: vllm-inferencespec:  replicas: 1  selector:    matchLabels:      app: vllm-inference  template:    metadata:      labels:        app: vllm-inference    spec:      securityContext:        runAsNonRoot: true        runAsUser: 65532        fsGroup: 65532      containers:      - name: vllm        image: registry.cleanstart.com/vllm:0.4-prod        ports:        - containerPort: 8000        resources:          requests:            memory: "8Gi"            cpu: "4"            nvidia.com/gpu: "1"          limits:            memory: "16Gi"            cpu: "8"            nvidia.com/gpu: "1"        securityContext:          readOnlyRootFilesystem: true          allowPrivilegeEscalation: false          capabilities:            drop: ["ALL"]        volumeMounts:        - name: tmp          mountPath: /tmp        - name: cache          mountPath: /var/cache      volumes:      - name: tmp        emptyDir:          sizeLimit: 10Gi      - name: cache        emptyDir:          sizeLimit: 20Gi      nodeSelector:        nvidia.com/gpu: "true"EOF

Step 4: Deploy to Kubernetes

kubectl apply -f k8s-deployment.yamlkubectl get pods -l app=vllm-inferencekubectl logs deployment/vllm-inference

Lab 3: PyTorch Training Pipeline

Time: 35 minutes | Focus: Multi-stage builds, training isolation, output verification

Scenario

You run distributed training jobs that require isolation from other cluster workloads. You'll build a PyTorch training container that separates build dependencies from runtime, includes verification of training outputs, and logs metrics for monitoring.

Step 1: Create Training Dockerfile

mkdir -p ~/labs/ai-containers/lab-3-pytorchcd ~/labs/ai-containers/lab-3-pytorch cat > Dockerfile << 'EOF'FROM registry.cleanstart.com/python:3.11-pytorch-2.1-prod WORKDIR /training # Install training dependenciesRUN pip install --no-cache-dir \    torch==2.1.0 \    torchvision==0.16.0 \    tensorboard==2.14.0 \    numpy==1.26.0 # Copy training scriptsCOPY train.py .COPY config.yaml . USER 65532:65532 ENTRYPOINT ["python", "train.py"]EOF

Step 2: Create Training Script

cat > train.py << 'EOF'import torchimport torch.nn as nnfrom torch.utils.data import DataLoaderimport jsonfrom datetime import datetime class SimpleModel(nn.Module):    def __init__(self):        super().__init__()        self.fc = nn.Linear(784, 10)     def forward(self, x):        return self.fc(x.view(x.size(0), -1)) def train():    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")    model = SimpleModel().to(device)    optimizer = torch.optim.Adam(model.parameters())     metrics = {        "start_time": datetime.now().isoformat(),        "device": str(device),        "model_params": sum(p.numel() for p in model.parameters()),    }     print(json.dumps(metrics))    print("Training completed successfully") if __name__ == "__main__":    train()EOF

Step 3: Run Training Job in Kubernetes

cat > k8s-training-job.yaml << 'EOF'apiVersion: batch/v1kind: Jobmetadata:  name: pytorch-trainingspec:  template:    spec:      securityContext:        runAsNonRoot: true        runAsUser: 65532      containers:      - name: trainer        image: registry.cleanstart.com/pytorch:2.1-prod        volumeMounts:        - name: data          mountPath: /data        - name: output          mountPath: /output        resources:          requests:            memory: "16Gi"            cpu: "8"          limits:            memory: "32Gi"            cpu: "16"        securityContext:          readOnlyRootFilesystem: true          allowPrivilegeEscalation: false      volumes:      - name: data        emptyDir: {}      - name: output        emptyDir: {}      restartPolicy: NeverEOF kubectl apply -f k8s-training-job.yaml

What You Learned

Throughout this lab, you explored five essential principles for secure AI containers. Multi-stage builds separate build dependencies from runtime, reducing attack surface and image size significantly. Shell-less execution prevents command injection attacks by removing all shell interpreters. Read-only filesystems protect model weights and application code from modification during runtime. GPU integration demonstrates how CleanStart supports specialized hardware while maintaining security. Resource management ensures training and inference jobs don't consume excessive cluster resources and can be scheduled appropriately.

Cleanup

Remove lab artifacts:

rm -rf ~/labs/ai-containersdocker rmi ollama-secure:latest 2>/dev/null || truekubectl delete job pytorch-training 2>/dev/null || truekubectl delete deployment vllm-inference 2>/dev/null || true

Next Steps

Deploy these containers to your production infrastructure with Helm charts for easier management and scaling. Integrate with your CI/CD pipeline to automatically build and test new model versions. Set up monitoring and alerting for GPU utilization and inference latency. Create NetworkPolicies to isolate AI workloads from other cluster traffic.

Estimated Time: 120 minutes | Hands-on: ~90 minutes | Reading: ~30 minutes