Knowledge Hub

Troubleshooting Guide

Quick Diagnosis

clnstrt-cli health-check \  --verbose \  --show-config # Shows: API Key status, Intelligence Core connectivity, network health, disk space

Image Pull Errors

"name unknown: repository not found"

Solution:

To verify that an image exists, you can query the registry API to list available images matching your criteria. Try pulling the image with its full URI to ensure you have the correct registry path. You can also list available versions using the intelligence query command to find the package you need.

# Verify image existscurl -H "Authorization: Bearer $CLEANSTART_API_KEY" \  https://api.cleanstart.dev/v1/images/search?name=python # Try pulling with full URIdocker pull gcr.io/cleanstart-images/runtimes/python:3.12-prod # List available versionsintelligence query packages --filter name=python --limit=50

"denied: permission denied"

Solution:

When you encounter permission errors, authenticate with your registry to ensure you have the proper credentials. Verify that your API key has the necessary permissions for reading images. If you're still having issues, try re-authenticating by logging out and logging back in to refresh your credentials.

# Authenticate with registrydocker login YOUR_REGISTRYdocker ps  # Test authentication # Verify API key permissionsclnstrt-cli auth list-keys# Ensure "images:read" scope is present # Re-authenticatedocker logout YOUR_REGISTRYdocker login YOUR_REGISTRY

"ErrImagePull" in Kubernetes

Solution:

When Kubernetes cannot pull images, the solution involves creating and configuring an image pull secret that contains your registry credentials. Add this secret to your deployment's imagePullSecrets section, then verify the pod can describe its pull status.

# Create image pull secretkubectl create secret docker-registry gcr-secret \  --docker-server=gcr.io \  --docker-username=_json_key \  --docker-password="$(cat ~/key.json)" \  --namespace=my-app # Add to deployment (imagePullSecrets: - name: gcr-secret) # Verifykubectl describe pod POD_NAME -n my-app

Build Failures

"failed to build: no such file or directory"

Solution:

This error indicates that a referenced file or base image doesn't exist during the build process. Verify that your Dockerfile is in the expected location and check that the base image can be pulled from the registry. Use absolute paths in COPY and ADD instructions to ensure files are found correctly.

# Verify Dockerfile existsls -la Dockerfile # Check base image is accessibledocker pull gcr.io/cleanstart-images/runtimes/python:3.12-prod # Use absolute paths in COPY/ADDCOPY ./app /app  # GoodCOPY app /app    # Avoid

"Docker daemon not running"

Solution:

The Docker daemon needs to be started before you can build or run containers. Start the daemon using your system's service manager, then verify it's running by listing containers. If you encounter socket permission issues, add your user to the docker group.

# Start Docker daemonsudo systemctl start docker # For Mac: open -a Docker # Verify runningdocker ps # Fix socket permissionssudo usermod -aG docker $USER # Use Podman as alternativepodman build -t my-app:latest .

"out of disk space"

Solution:

When Docker runs out of disk space, clean up unused Docker artifacts to free space. You can remove all unused images and containers, or consider increasing the Docker Desktop disk allocation. For production systems, use minimal base images to reduce overall storage requirements.

df -h  # Check disk usage # Clean up Dockerdocker system prune -a  # ⚠️ Removes all unused imagesdocker image prune -a # Increase Docker Desktop disk size in Settings # Use minimal base images (distroless: 20MB vs 500+MB)

FIPS Compliance Issues

"FIPS validation failed"

Solution:

To verify FIPS compatibility, use the verification command with the FIPS flag to check your image. Deploy FIPS-ready base images that come pre-configured with FIPS OpenSSL modules. Update the cryptography library to ensure it supports FIPS mode, and enable FIPS mode in your operating system if required.

# Check FIPS compatibilityclnstrt-cli verify \  --image YOUR_REGISTRY/my-app:v1 \  --verify-fips # Use FIPS-ready base imagesdocker pull gcr.io/cleanstart-images/runtimes/python:3.12-fips # Update cryptography librarypip install cryptography --upgrade # Enable FIPS in OS (RHEL/CentOS)sudo fips-mode-setup --enable

"TLS certificate validation failed"

Solution:

TLS certificate validation failures occur when your certificates use algorithms that aren't FIPS-approved. Check your certificate's signature algorithm to ensure it uses FIPS-compliant cryptography like RSA-2048 or ECDSA-256. If needed, regenerate certificates with FIPS-approved algorithms and inject them using the cleanimg-customize specification.

# Check certificate algorithmsopenssl x509 -in cert.pem -text -noout | grep -i algorithm # Regenerate with FIPS algorithms (RSA-2048+, ECDSA-256+) # Inject updated certificate using cleanimg-customize speccat > fips-cert-spec.yaml <<EOFbase_image: "registry.cleanstart.com/cleanstart/python:3.12-prod"arch: amd64variant: prodcopy_files:  - source: ./fips-cert.pem    destination: /etc/ssl/certs/fips-cert.pemEOFcleanimg-customize build --spec fips-cert-spec.yaml --tag my-app:v1-fips # Verify FIPS complianceclnstrt-cli verify --image my-app:v1-fips --verify-fips

Helm & Kubernetes Issues

"Chart not found"

Solution:

When Helm cannot find a chart, add the CleanStart Helm repository to your local Helm cache, then update the repository index. You can search for available charts to verify they exist and list all available repositories to confirm your setup.

# Add CleanStart Helm repositoryhelm repo add cleanstart https://charts.cleanstart.devhelm repo update # Search for charthelm search repo cleanstart # List all chartshelm repo list

"CrashLoopBackOff" in pod

Solution:

A CrashLoopBackOff status indicates the container is crashing immediately after starting. Check the pod logs to see what error is occurring, including logs from the previous run if the pod has restarted. Describe the pod to view events and resource limits, and test the image locally to verify it works.

# Check pod logskubectl logs POD_NAME -n NAMESPACEkubectl logs POD_NAME -n NAMESPACE --previous # Describe pod for eventskubectl describe pod POD_NAME -n NAMESPACE # Check resource limitskubectl describe pod POD_NAME -n NAMESPACE | grep -A5 Limits # Test image locally firstdocker run --rm -it YOUR_REGISTRY/my-app:v1 bash

"Insufficient memory"

Solution:

When pods don't have enough memory to run, check the current capacity and usage of your cluster nodes. Increase the memory request and limit for your pod, or add more nodes to your cluster to provide additional resources.

# Check node capacitykubectl describe node NODE_NAME | grep Allocated -A5kubectl top nodeskubectl top pods -n my-app # Increase pod memorykubectl set resources deployment my-app \  --limits=memory=2Gi \  --requests=memory=512Mi \  -n my-app # Add more nodes to cluster

CI/CD Pipeline Failures

GitHub Actions: "API key invalid"

Solution:

GitHub Actions requires API keys to be set as secrets in your repository settings. Update your workflow to use the CLEANSTART_API_KEY secret from GitHub Actions, then re-create the secret if needed by removing it from the repository settings and adding it again.

# Verify secret is setSettings → Secrets and variables → Actions → CLEANSTART_API_KEY # Update workflow to use secretenv:  CLEANSTART_API_KEY: ${{ secrets.CLEANSTART_API_KEY }}run: clnstrt-cli analyze-dependencies --sbom sbom.spdx # Re-create secret if needed

GitLab CI: "Unauthorized"

Solution:

GitLab CI requires a project access token with the appropriate scopes to be configured as a CI/CD variable. Create a project access token with api and read_repository scopes, then add it to your CI/CD variables in the project settings.

# Create project access tokenSettings → Access Tokens → Create project access token # Grant scopes: api, read_repository # Add to CI/CD variableSettings → CI/CD → Variables → CLEANSTART_API_KEY

Pipeline timeout

Solution:

Pipeline timeouts can be resolved by increasing the timeout configuration in your pipeline definition. Enable caching to reduce scanning time, parallelize scanning across multiple jobs to distribute the workload, and use a smaller SBOM by filtering to production dependencies only.

# Increase timeout in pipelinetimeout-minutes: 30  # GitHub Actionstimeout: 30m         # GitLab CI # Enable cachingclnstrt-cli analyze-dependencies \  --sbom sbom.spdx \  --cache-results \  --cache-ttl 3600s # Parallelize scanningclnstrt-cli analyze-dependencies \  --sbom sbom.spdx \  --parallel-jobs 8 # Use smaller SBOM--scope production  # Skip dev dependencies

cleanimg-init Issues

"Project already initialized"

Solution:

When a project is already initialized, check the existing configuration files in the .cleanstart directory. Optionally back up your current configuration before reinitializing, or use the update-only flag to modify the existing configuration without replacing it entirely.

# Check existing filesls -la .cleanstart/ # Re-initialize (backup first)cp -r .cleanstart .cleanstart.backupcleanimg-init --force # Update existing configcleanimg-init --update-only

"Dockerfile not found"

Solution:

If your Dockerfile doesn't exist, create it with appropriate content for your application. Alternatively, point cleanimg-init to a custom Dockerfile location, or generate a Dockerfile from a template that matches your application's technology stack.

# Create Dockerfilecat > Dockerfile << EOFFROM gcr.io/cleanstart-images/runtimes/python:3.12-prodWORKDIR /appCOPY . .CMD ["python", "app.py"]EOF # Point to custom locationcleanimg-init --dockerfile ./docker/Dockerfile # Generate from templatecleanimg-init --template python3.12

"Security policy validation failed"

Solution:

Review the security policy to understand what violations occurred. Use the analyze-dependencies command to identify vulnerable packages and get recommendations for fixes. Update the security policy to match your organization's requirements, or use the verbose validation flag to see detailed violation information.

# Review policycat .cleanstart/security-policy.yaml # Fix vulnerabilitiesclnstrt-cli analyze-dependencies \  --sbom sbom.spdx \  --recommendations # Update policycleanimg-init --policy baseline # View detailed violationscleanimg-init --validate --verbose

Multi-Stage Build Failures

"Failed to copy file from build stage"

Solution:

When copying files from a build stage fails, first verify that the file actually exists in the build stage by building and running the builder target separately. Check that your COPY instruction uses the correct path that exists in the source stage, and debug multi-stage copy issues using the verbose build progress output.

# Verify the file exists in build stagedocker build --target builder -t myapp:builder .docker run --rm myapp:builder ls -la /app/dist # Check path is correct in COPY instruction# Correct:COPY --from=builder /app/dist /app/dist # Wrong:COPY --from=builder /app /app  # Path mismatch # Debug multi-stage copy issuesdocker build --progress=plain --target builder .

"Dependency resolution failure in build stage"

Solution:

Test the build stage independently to identify dependency resolution issues. Check the package manager cache in the container to see what's already installed. Force a dependency refresh using the appropriate package manager commands, and use pinned versions to avoid version mismatches between build stages.

# Test build stage independentlydocker build --target builder -t myapp:builder . # Check package manager cachedocker run --rm myapp:builder apt list --installed | grep -i openssl # Force dependency refresh (Alpine)RUN apk update && apk add --no-cache package-name # Force dependency refresh (Debian/Ubuntu)RUN apt-get update && apt-get install -y package-name # Use pinned versions to avoid version mismatchRUN apt-get install -y openssl=1.1.1-1ubuntu2.20.04

"Architecture mismatch in multi-stage build"

Solution:

When building for multiple architectures, verify that your build platform and target architectures match. For cross-platform builds, use buildx with the platform flag. When using different architectures in multi-stage builds, specify the platform for each stage to ensure compatibility. Check the final image architecture to confirm it matches your target.

# Verify build machine and target architectures matchdocker buildx ls  # Shows platforms supported # For cross-platform builds, specify platformdocker buildx build --platform linux/amd64,linux/arm64 -t myapp . # If using multi-stage on different architectures:FROM --platform=$BUILDPLATFORM golang:1.21 as builderFROM --platform=$TARGETPLATFORM alpine:3.19 # Check final image architecturedocker inspect myapp | grep -i architecture

UID/GID and Permission Issues

"Permission denied: running as non-root user (UID 65532)"

Solution:

CleanStart uses an unprivileged UID 65532 for security. If your application expects root or a specific user ID, you have several options: modify your application to work with UID 65532, use RUN --chmod to set correct file permissions during build, or run specific build commands as root with mounted secrets. Always verify the running user and file permissions using id and ls commands.

# Understand the issue: CleanStart uses unprivileged UID 65532# Your app may expect root (UID 0) or a specific user # Option 1: Change app to work with UID 65532# In your Dockerfile:USER 65532:65532 # Test locallydocker run --user 65532:65532 myapp # Option 2: Use RUN --chmod to set file permissions correctlyCOPY --chown=65532:65532 --chmod=0755 app /app/bin/app # Option 3: Run specific commands as root (during build only)RUN --mount=type=secret,id=api_key \  cat /run/secrets/api_key > /etc/config/api.key && \  chown 65532:65532 /etc/config/api.key && \  chmod 0600 /etc/config/api.key # Check file permissionsdocker run myapp ls -la /app/docker run myapp id  # Verify UID 65532

"Write permission denied on mounted volume"

Solution:

Volumes mounted with incorrect ownership prevent the application from writing files. Configure your docker-compose or Kubernetes deployment to run with the correct user and group, and ensure the host directory has matching ownership. For Kubernetes, configure fsGroup to ensure volumes are writable by the container.

# Issue: volumes mounted with wrong ownership # In docker-compose.yml:services:  app:    image: myapp    volumes:      - ./data:/app/data    user: "65532:65532"    environment:      - CLEANSTART_NON_ROOT=true # Fix permissions on hostsudo chown 65532:65532 ./datasudo chmod 755 ./data # In Kubernetes (securityContext):spec:  containers:    - name: app      securityContext:        runAsNonRoot: true        runAsUser: 65532        runAsGroup: 65532        fsGroup: 65532  # Ensures volume permissions work      volumeMounts:        - name: data          mountPath: /app/data  volumes:    - name: data      emptyDir: {}

"initgroups failed for UID 65532"

Solution:

The error occurs when necessary files for group/user resolution are missing. Verify that /etc/passwd and /etc/group files exist in your image and contain entries for UID 65532. If needed, copy these system files into your image during the build process.

# Issue: nsswitch.conf or group file missing # Verify necessary files exist in imagedocker run myapp cat /etc/passwd | grep 65532docker run myapp cat /etc/group | grep 65532 # In Dockerfile, ensure base image has these filesFROM cleanstart/python:3.12-prod# Base image should already have passwd/group # If manually creating filesystem:COPY --chown=root:root /etc/passwd /etc/passwdCOPY --chown=root:root /etc/group /etc/group

Health Check and Probe Failures

"Liveness probe failed: connection refused"

Solution:

A failed liveness probe indicates the container is running but the health check cannot connect. Determine what port your application is listening on by checking network statistics inside the container. Ensure your liveness probe targets the correct port that your application is using.

# Understand the issue: Container starts but health check fails # Check what port app is listening ondocker run myapp netstat -tlnp | grep LISTEN# ordocker run myapp ss -tlnp # Verify liveness probe matches actual portapiVersion: v1kind: Podmetadata:  name: appspec:  containers:    - name: app      livenessProbe:        httpGet:          path: /health          port: 8080  # Must match app's port        initialDelaySeconds: 10        periodSeconds: 10 # Test probe manuallycurl http://localhost:8080/health # Check if app is actually listeningdocker logs POD_NAME | grep -i listendocker logs POD_NAME | grep -i port

"Readiness probe failed: application not ready"

Solution:

When readiness probes fail, the health endpoint is returning a non-200 status, indicating the application isn't ready to serve traffic. Test the health endpoint directly to see what status it returns. Common causes include unestablished database connections, missing environment variables, or failing startup scripts. Implement a proper readiness check in your Dockerfile using the HEALTHCHECK instruction.

# Issue: Health endpoint returns non-200 status # Test health endpoint directlydocker run -d myappdocker exec CONTAINER_ID curl -v http://localhost:8080/health # Expected response should be 200 OK# If getting 503 or timeout, app is not ready # Common causes:# 1. Database connection not establisheddocker logs CONTAINER_ID | grep -i databasedocker logs CONTAINER_ID | grep -i connection # 2. Environment variable not setdocker inspect CONTAINER_ID | grep -i env # 3. Startup script failingdocker run -it myapp /bin/sh  # Interactive debugdocker run myapp cat /startup.log  # Check startup logs # In Dockerfile, implement proper readiness check:HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \  CMD curl -f http://localhost:8080/health || exit 1

"Startup probe timing out (30s+ delays)"

Solution:

When startup probes timeout, increase the failure threshold to allow more time for initialization. Optimize your startup sequence by profiling startup time and moving slow initialization to background processes. Pre-compile code or cache expensive operations to speed up initial startup.

# Issue: Container takes too long to start # Increase startupProbe timeoutspec:  containers:    - name: app      startupProbe:        httpGet:          path: /health          port: 8080        failureThreshold: 30  # 30 * 10s = 300s total        periodSeconds: 10 # Optimize startup sequence# Profile startup timedocker run -it myapp time python app.py # Move slow initialization to background# Example: Load large dataset asyncif __name__ == '__main__':  start_async_initialization()  # Returns immediately  start_server()  # Starts HTTP server so health check passes # Precompile or cache expensive operations# Example: Python importspython -m compileall /app  # Pre-compile .py to .pyc

Helm Chart Deployment Failures

"Values override not applied"

Solution:

When custom values aren't applied, verify that they were passed correctly to the helm install command using the dry-run flag. Ensure your YAML is valid by checking indentation and quote syntax. Verify that the template file references your custom values correctly, then use helm template to debug the rendered output before deployment.

# Issue: Custom values in values.yaml or --set not taking effect # Verify values were passedhelm install myapp mychart -f values.yaml --dry-run --debug# Look for your overridden values in output # Check Helm syntax# values.yaml must be valid YAMLcat values.yaml  # Check indentation, quotes # Verify value path in template# In templates/deployment.yaml:env:  - name: DATABASE_URL    value: {{ .Values.database.url | quote }} # In values.yaml:database:  url: "postgres://..." # Debug commandhelm template myapp mychart -f values.yaml | grep -A 5 env # Common mistake: Wrong indentation in values overridehelm install myapp mychart -f values.yaml  # Check values.yaml structure # Override from command linehelm install myapp mychart \  --set database.url="postgres://..." \  --set replicas=3

"Security context conflicts with image requirements"

Solution:

Security context issues occur when requirements like read-only file systems conflict with the application's need to write temporary files. Create emptyDir volumes for temporary storage, or ensure writable mount paths exist in the image. Only relax security contexts if absolutely necessary.

# Issue: securityContext requires capabilities image doesn't support # The problem:spec:  securityContext:    runAsNonRoot: true    readOnlyRootFilesystem: true  # But app needs to write /tmp # Solution 1: Use emptyDir volumes for temporary storagevolumeMounts:  - name: tmp    mountPath: /tmpvolumes:  - name: tmp    emptyDir: {} # Solution 2: Create writable mount paths in imageRUN mkdir -p /var/cache/app && chmod 1777 /var/cache/app # Solution 3: Relax security context if necessary (not recommended)# Only if justified:spec:  securityContext:    readOnlyRootFilesystem: false  # Allow write # Verify what CleanStart image allowsdocker run --rm gcr.io/cleanstart-images/runtimes/python:3.12-prod \  ls -la / | grep -E "^d.*w"  # Check writable directories

"Helm dependency resolution failed"

Solution:

Helm dependency failures occur when chart dependencies cannot be found. Update chart dependencies using the dependency update command, verify the Chart.yaml syntax for the dependencies section, and ensure external repositories are added and accessible.

# Issue: Helm can't find chart dependencies # Update chart dependencieshelm dependency update ./mychartls -la mychart/charts/  # Should show .tgz files # Check Chart.yaml for dependency syntaxcat mychart/Chart.yaml | grep -A 10 dependencies # Correct syntax:dependencies:  - name: postgresql    version: "13.0.0"    repository: "https://charts.bitnami.com/bitnami" # Add repository if using external charthelm repo add bitnami https://charts.bitnami.com/bitnamihelm repo update # Verify repo is accessiblecurl https://charts.bitnami.com/bitnami/index.yaml | head # If using local chart:dependencies:  - name: my-lib    version: "1.0.0"    repository: "file://../my-lib-chart"

Registry Authentication Errors

"Unauthorized: authentication required"

Solution:

Registry credentials must be provided through Docker login or Kubernetes secrets. Test direct registry access to verify connectivity and credentials. For Kubernetes, create a docker-registry secret and reference it in your deployment's imagePullSecrets.

# Issue: Registry credentials not provided or expired # Test registry access directlycurl -u username:password https://registry.example.com/v2/ # For Docker logindocker login registry.example.com# Credentials stored in ~/.docker/config.json # For Kubernetes, create secretkubectl create secret docker-registry regcred \  --docker-server=registry.example.com \  --docker-username=YOUR_USERNAME \  --docker-password=YOUR_PASSWORD \  --docker-email=YOUR_EMAIL \  -n default # Verify secretkubectl get secret regcred -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d # In Deployment spec:spec:  imagePullSecrets:    - name: regcred  containers:    - image: registry.example.com/myapp:latest

"Forbidden: not authorized to access image"

Solution:

When an API key exists but lacks permissions, check the current key's scopes to determine what's missing. Create a new API key with the appropriate scopes for image operations, then update your CI/CD secrets with the new key. For service accounts in Kubernetes, follow a similar process.

# Issue: API key exists but lacks required permissions # Check API key permissionsclnstrt-cli auth list-keys# Look for scopes: images:read, images:write, etc. # Create new API key with correct scopesclnstrt-cli auth create-key \  --name "CI/CD Deploy" \  --scopes "images:read,images:write" # Update CI/CD secret with new key# GitHub: Settings → Secrets → Update CLEANSTART_API_KEY# GitLab: Settings → CI/CD Variables → Update CLEANSTART_API_KEY # For service accounts in Kubernetes:clnstrt-cli auth create-sa \  --name "k8s-image-pull" \  --scopes "images:read"

"Certificate verification failed (HTTPS)"

Solution:

HTTPS certificate verification failures occur with self-signed or corporate certificates. Configure your system to trust the certificate by placing it in the Docker certificates directory, or set up containerd with the proper CA certificates. Verify the certificate using OpenSSL commands.

# Issue: Self-signed or corporate certificate not trusted # For docker:docker login --username user https://registry.example.com # If certificate error:# Option 1: Use CA certificatemkdir -p /etc/docker/certs.d/registry.example.comcp ca.crt /etc/docker/certs.d/registry.example.com/ca.crtsudo systemctl restart docker # Option 2: Configure Docker insecure registry (dev only, not recommended)echo '{"insecure-registries":["registry.example.com"]}' | \  sudo tee /etc/docker/daemon.jsonsudo systemctl restart docker # For containerd/Kubernetes:cat > /etc/containerd/certs.d/registry.example.com/hosts.toml <<EOF[host."https://registry.example.com"]  ca = "/etc/ssl/certs/ca.crt"EOF # Verify certificateopenssl s_client -connect registry.example.com:443

cleanimg-customize Spec Validation Errors

"Invalid spec: unknown field 'copy_files'"

Solution:

Ensure your cleanimg-customize specification uses the correct field names and follows the proper YAML structure. Validate your specification using the validate command with verbose output to identify syntax errors. Check the available fields using the inspect-template command.

# Issue: YAML syntax error or unknown field name # Verify spec structurecat > spec.yaml << EOFbase_image: "gcr.io/cleanstart-images/runtimes/python:3.12-prod"arch: "amd64"variant: "prod"custom_files:  # NOT copy_files  - source: "./app.py"    destination: "/app/app.py"    mode: "0755"copy_packages:  - name: "curl"    version: "7.88.0"EOF # Validate speccleanimg-customize validate --spec spec.yaml --verbose # Check available fieldscleanimg-customize inspect-template --show-fields

"Build failed: dependency not found"

Solution:

When a package cannot be found, list the available packages in the base image to find the correct name. Update the package manager cache in your specification, and use the default version if the specific version isn't available.

# Issue: Requested package not available in base image package manager # List available packages in base imagedocker run gcr.io/cleanstart-images/runtimes/python:3.12-prod \  apt search "package-name"  # For Debian-based# ordocker run gcr.io/cleanstart-images/runtimes/python:3.12-prod \  apk search "package-name"  # For Alpine-based # Update package manager cachebase_image: "gcr.io/cleanstart-images/runtimes/python:3.12-prod"run_commands:  - "apt-get update"copy_packages:  - name: "curl" # Use package manager default version if specific version unavailablecopy_packages:  - name: "curl"    # Omit version to use latest available

"YAML parsing error: invalid indentation"

Solution:

YAML indentation errors occur when using tabs instead of spaces or when nesting is incorrect. Validate your YAML using the yamllint tool, and convert any tabs to spaces using sed.

# Issue: YAML indentation is wrong (spaces vs tabs, incorrect nesting) # Valid YAML must use spaces (not tabs) and consistent indentationcat > spec.yaml << 'EOF'base_image: "gcr.io/cleanstart-images/runtimes/python:3.12-prod"arch: "amd64"variant: "prod"copy_packages:  - name: "curl"    version: "7.88.0"custom_files:  - source: "./app.py"    destination: "/app/app.py"run_commands:  - "echo 'Building...'"EOF # Use yamllint to validateyamllint spec.yaml # Convert tabs to spacessed -i 's/\t/  /g' spec.yaml

FIPS Mode Troubleshooting

"FIPS module initialization failed"

Solution:

FIPS module initialization requires the use of FIPS-certified base images that have OpenSSL FIPS modules preinstalled. Enable FIPS at runtime by setting the OPENSSL_CONF environment variable, and verify FIPS mode is active by checking the OpenSSL version.

# Issue: FIPS mode enabled but cryptography not available # Use FIPS-certified base imageFROM gcr.io/cleanstart-images/runtimes/python:3.12-fips # Enable FIPS at runtimedocker run \  -e OPENSSL_CONF=/etc/ssl/openssl-fips.cnf \  myapp:fips # Verify FIPS is enableddocker run myapp:fips openssl version# Should show "OpenSSL 3.0.x FIPS" # Check FIPS status in containerdocker run myapp:fips python -c \  "from cryptography.hazmat.backends import openssl; print(openssl.backend.fips_enabled)"

"Cipher suite not available in FIPS mode"

Solution:

FIPS mode disables non-approved cipher suites. Identify which ciphers are unsupported and update your application to use only FIPS-approved ciphers like AES-GCM and SHA-256. Test compliance using the verification command.

# Issue: Application uses cipher suite disabled in FIPS mode # Identify unsupported cipherdocker run myapp:fips openssl ciphers -v | grep -i aes # Update application to use FIPS-approved ciphers# Avoid: RC4, MD5, DES# Use: AES-GCM, SHA-256, SHA-384, ECDHE # In Python:from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes# Use: algorithms.AES(key), modes.GCM(nonce)# Not: algorithms.RC4(key) [deprecated] # In Go:import "crypto/cipher"import "crypto/aes"// Use: cipher.NewGCM(block) # Test complianceclnstrt-cli verify --image myapp:fips --verify-fips --strict

Kubernetes Deployment Status Issues

"ImagePullBackOff" status

Solution:

ImagePullBackOff occurs when Kubernetes cannot pull an image from the registry. Check that the pull secret exists and is correctly referenced, verify the image exists in the registry, and test network connectivity from the node to the registry.

# Issue: Kubernetes can't pull image from registry # Check pull secretkubectl get secret -A | grep dockerkubectl get secret regcred -o yaml # Verify image exists in registrydocker pull gcr.io/cleanstart-images/runtimes/python:3.12-prod # Check node can reach registrykubectl debug node/NODE_NAME -it --image=alpine# Inside debug container:curl https://gcr.io/v2/  # Test connectivitynslookup gcr.io # Update imagePullSecrets in deploymentspec:  imagePullSecrets:    - name: regcred  # Must exist in same namespace  containers:    - image: gcr.io/cleanstart-images/runtimes/python:3.12-prod

"CrashLoopBackOff" with failed liveness probe

Solution:

CrashLoopBackOff with a failed liveness probe indicates the pod starts but immediately fails the health check. Check pod logs from the current and previous runs, examine pod events for clues, and increase the startup grace period to allow more time for initialization. Debug interactively when needed.

# Issue: Pod starts but crashes immediately # Check logskubectl logs POD_NAME -n NAMESPACEkubectl logs POD_NAME -n NAMESPACE --previous  # Last run # Check eventskubectl describe pod POD_NAME -n NAMESPACE | grep -A 10 Events # Increase startup grace periodspec:  containers:    - name: app      startupProbe:        httpGet:          path: /health          port: 8080        failureThreshold: 30        periodSeconds: 10 # Debug in-placekubectl debug POD_NAME -it --image=alpine# Or:kubectl exec -it POD_NAME -- /bin/sh

"OOMKilled" (Out of Memory)

Solution:

When a pod is OOMKilled, check its current memory usage against the limit. Increase the memory limit and request in the deployment specification. Analyze whether the application has a memory leak by monitoring memory growth over time.

# Issue: Pod killed due to exceeding memory limit # Check current memory usagekubectl top pods POD_NAME # Increase memory limitkubectl patch deployment myapp -p \  '{"spec":{"template":{"spec":{"containers":[{"name":"app","resources":{"limits":{"memory":"2Gi"},"requests":{"memory":"512Mi"}}}]}}}}' # Or update deployment YAML:spec:  containers:    - name: app      resources:        requests:          memory: "512Mi"        limits:          memory: "2Gi" # Analyze memory leakdocker run --memory=512m myapp  # Reproduce locally# Monitor growth over timekubectl logs POD_NAME | grep -i memory

Common Error Messages and Solutions

Error Message	Cause	Solution
`Failed to pull image: repository not found`	Image doesn't exist or wrong registry	Verify image name and registry URL
`Permission denied`	API key lacks scope or expired	Create new API key with correct scopes
`exec: no such file or directory`	ENTRYPOINT or CMD not found	Verify binary path, use full absolute path
`Bind: address already in use`	Port already in use	Change port or kill existing process
`Cannot connect to Docker daemon`	Docker not running	Start Docker: `systemctl start docker`
`dockerfile not found`	Dockerfile path wrong	Use full path: `--file ./path/to/Dockerfile`
`invalid reference format`	Image tag syntax wrong	Use format: `registry/image:tag`
`EOF while parsing`	YAML indentation or syntax error	Use `yamllint` to validate
`connection timeout`	Network/firewall blocking	Check network connectivity, firewall rules
`deployment does not match pod template spec`	Pod template changed after deployment	Delete pods to force recreation: `kubectl rollout restart`

General Tips

1. Enable Debug Mode

clnstrt-cli --log-level debug --verbose \  analyze-dependencies --sbom sbom.spdx export CLEANSTART_DEBUG=true

2. Check System Health

clnstrt-cli health-check --detailedenv | grep CLEANSTARTenv | grep DOCKERenv | grep KUBERNETES

3. Clear Cache

rm -rf ~/.cleanstart/cachedocker system prune -adocker buildx prune -apip cache purge

4. Review Documentation

clnstrt-cli --helpclnstrt-cli COMMAND --helpopen https://docs.cleanstart.dev

5. Collect Debug Bundle for Support

clnstrt-cli debug-bundle --output debug-bundle.tar.gz# Email to: support@cleanstart.dev

Getting Help

Self-Service: Documentation: https://docs.cleanstart.dev, Knowledge Base: https://help.cleanstart.dev, Video Tutorials: https://youtube.com/@cleanstart, and Community Slack: https://slack.cleanstart.dev.

Professional Support: Email: support@cleanstart.dev, Phone: 1-800-CLEAN-SECURITY (Enterprise), Live Chat: https://portal.cleanstart.dev (in-portal), and GitHub Issues: https://github.com/cleanstart/issues.

When Contacting Support:

Gather diagnostic information by running the health check command with verbose output. Create a debug bundle containing system logs and configuration. Describe the steps you've already taken to troubleshoot the issue. Include relevant logs and error messages. Specify your environment including OS, Docker version, and Kubernetes version if applicable.

Last Updated: January 2024