Running AI and ML Workloads with CleanStart Base Images
CleanStart provides general-purpose base images (Python, Node.js, etc.) for building AI and ML applications. While CleanStart does not offer dedicated AI/ML runtime images, you can easily create specialized containers for your AI workloads using CleanStart base images as the foundation. You can build AI/ML applications through several key capabilities. The Python base image (cleanstart/python:3.12) supports PyTorch, TensorFlow, scikit-learn, and other Python ML frameworks. The Node.js base image (cleanstart/node:20) enables JavaScript-based ML libraries like TensorFlow.js. Pre-configured package managers with pip and npm allow for easy dependency installation. A security-scanned base layer ensures your ML workloads build on secure foundations. Multi-stage build support helps optimize production ML container images.
The following diagram illustrates the AI runtime dependency tree with CleanStart as the foundation:
graph TB A["CleanStart Base<br/>Python 3.12"] -->|Layer 1| B["Python Runtime<br/>pip/venv"] B -->|Install| C["PyTorch"] B -->|Install| D["TensorFlow"] B -->|Install| E["scikit-learn"] C -->|Depend on| C1["CUDA Toolkit<br/>Optional"] C -->|Depend on| C2["cuDNN<br/>Optional"] C -->|Depend on| C3["NumPy"] D -->|Depend on| D1["NVIDIA NCCL<br/>Optional"] D -->|Depend on| D2["Protobuf"] D -->|Depend on| D3["NumPy"] E -->|Depend on| E1["Scikit-build"] E -->|Depend on| E2["NumPy"] C1 -->|GPU Support| F["GPU Runtime<br/>NVIDIA Docker"] C2 -->|GPU Support| F D1 -->|GPU Support| F F -->|Hardware| G["NVIDIA GPU<br/>CUDA Compute"] A -->|Layer 2| H["System Libraries<br/>glibc<br/>libssl"] C3 -->|Use| H D3 -->|Use| H E2 -->|Use| H H -->|Secure| I["FIPS Module<br/>Optional"] C -->|Application| J["Model Training"] D -->|Application| K["Inference Server"] E -->|Application| L["Data Processing"] J -->|Output| M["Trained Model<br/>Weights/Checkpoints"] K -->|Output| N["Predictions<br/>via API"] L -->|Output| O["Processed Data<br/>Datasets"] style A fill:#99ccff style B fill:#ccffcc style F fill:#ffff99 style I fill:#99ff99Getting Started with AI/ML Workloads
Building a PyTorch Application
Create a Dockerfile for a PyTorch-based workload:
FROM cleanstart/python:3.12 WORKDIR /app # Install PyTorch and dependenciesRUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu && \ pip install numpy pandas scikit-learn # Copy your training or inference codeCOPY requirements.txt .RUN pip install -r requirements.txt COPY . . # Run your ML applicationCMD ["python", "train.py"]Build and run:
docker build -t my-pytorch-app:latest .docker run --rm my-pytorch-app:latestBuilding a TensorFlow Application
Create a Dockerfile for a TensorFlow-based workload:
FROM cleanstart/python:3.12 WORKDIR /app # Install TensorFlow and dependenciesRUN pip install tensorflow numpy pandas scikit-learn # Copy your model training or serving codeCOPY requirements.txt .RUN pip install -r requirements.txt COPY . . EXPOSE 8000 # Run TensorFlow model server or training scriptCMD ["python", "serve.py"]Build and run:
docker build -t my-tensorflow-app:latest .docker run --rm -p 8000:8000 my-tensorflow-app:latestGPU Support
To use GPUs with your AI/ML workloads, you need to mount the host NVIDIA drivers into your container. This is not built into CleanStart but requires host-level configuration:
# Run with GPU support (requires NVIDIA Docker runtime)docker run --gpus all \ --rm \ my-pytorch-app:latestFor Kubernetes deployments with GPU support, ensure your cluster nodes have NVIDIA drivers installed and configure GPU requests in your pod specifications:
apiVersion: v1kind: Podmetadata: name: ml-workloadspec: containers: - name: pytorch-app image: my-pytorch-app:latest resources: limits: nvidia.com/gpu: "1" # Request 1 GPUExample: Simple ML Inference Server
Here's a complete example of building a lightweight inference server using CleanStart:
Step 1: Create app.py
from flask import Flask, request, jsonifyimport numpy as np app = Flask(__name__) # Load a simple model (example with numpy)def predict(features): # Simple linear regression example weights = np.array([0.5, -0.3, 0.2]) bias = 0.1 return float(np.dot(features, weights) + bias) @app.route('/health', methods=['GET'])def health(): return jsonify({"status": "healthy"}), 200 @app.route('/predict', methods=['POST'])def ml_predict(): data = request.json features = np.array(data['features']) prediction = predict(features) return jsonify({"prediction": prediction}), 200 if __name__ == '__main__': app.run(host='0.0.0.0', port=8000)Step 2: Create requirements.txt
flask==2.3.0numpy==1.24.0Step 3: Create Dockerfile
FROM cleanstart/python:3.12 WORKDIR /app # Install dependenciesCOPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt # Copy applicationCOPY app.py . EXPOSE 8000 HEALTHCHECK --interval=10s --timeout=3s --start-period=5s --retries=3 \ CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1 CMD ["python", "app.py"]Step 4: Build and Test
# Builddocker build -t ml-inference-server:latest . # Rundocker run -p 8000:8000 ml-inference-server:latest # Testcurl -X POST http://localhost:8000/predict \ -H "Content-Type: application/json" \ -d '{"features": [1.0, 2.0, 3.0]}'Best Practices for AI/ML Containers
1. Multi-Stage Builds for Size Optimization
Use multi-stage builds to reduce final image size:
# Build stageFROM cleanstart/python:3.12 AS builder WORKDIR /appCOPY requirements.txt .RUN pip install --user --no-cache-dir -r requirements.txt # Runtime stageFROM cleanstart/python:3.12 WORKDIR /app # Copy only Python packages from builderCOPY --from=builder /root/.local /root/.localENV PATH=/root/.local/bin:$PATH COPY app.py . CMD ["python", "app.py"]2. Version Pinning
Always pin package versions in requirements.txt for reproducibility:
torch==2.0.1tensorflow==2.13.0numpy==1.24.3pandas==2.0.3scikit-learn==1.3.03. Efficient Dependency Installation
Cache pip packages to speed up rebuilds:
FROM cleanstart/python:3.12 WORKDIR /app # Cache dependencies layerCOPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt # Application layer (changes frequently)COPY . . CMD ["python", "train.py"]4. Security Hardening
Run as a non-root user and enforce read-only filesystems where possible:
FROM cleanstart/python:3.12 WORKDIR /app # Create non-root userRUN useradd -m -u 1000 mluser COPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt COPY . . USER mluser CMD ["python", "app.py"]Troubleshooting
Issue: ModuleNotFoundError when importing a package Solution: Ensure the package is listed in requirements.txt and installed via pip install -r requirements.txt
Issue: Out of memory during training Solution: Increase Docker memory limit with docker run --memory=8g or reduce batch size in your training script
Issue: GPU not detected in container Solution: Ensure NVIDIA Docker runtime is installed and use docker run --gpus all flag
Issue: Slow package installation during Docker build Solution: Use pip install --no-cache-dir to reduce image size and avoid caching issues
Related Resources
For a detailed breakdown of how AI/ML frameworks, dependencies, and container architecture interact, see The AI/ML Container Stack Explained. For production deployment patterns including GPU scheduling, resource allocation, and Helm charts for AI workloads, see Deploying AI Containers to Production.
Next Steps
- Choose your ML framework (PyTorch, TensorFlow, scikit-learn, etc.)
- Create a
Dockerfilebased oncleanstart/python:3.12 - Install your dependencies via
pip install - Test locally with
docker buildanddocker run - Deploy to Kubernetes or cloud platforms with your trained models
