Knowledge Hub

The Complete Production Security Model: Read-Only + Shell-Less

Combining Read-Only + Shell-Less: The 97% Attack Surface Reduction

Read-only filesystems and shell-less containers are powerful independently, but together they form a security model that eliminates entire categories of attacks. This eliminates malware injection, interactive exploitation, and persistence mechanisms — reducing attack surface by 97%.

Part 1: Why Both Together

The following diagram visualizes how combining read-only filesystems and shell-less containers progressively eliminates attack paths:

1Traditional Container<br/>~400 viable paths2Read-Only Only<br/>~200 viable paths3Shell-Less Only<br/>~150 viable paths4Combined: Read-Only + Shell-Less<br/>~12 viable paths (97% reduction)

The Attack Surface Reduction Equation

Attack surface is not just the number of entry points, but the number of viable exploitation paths. Let's model a realistic container attack to understand how security controls affect the adversary's options.

Traditional Container (Writable + Shell)

In a traditional container with a writable filesystem and shell access, attackers have multiple exploitation paths available. The first path involves a buffer overflow or injection vulnerability in the application code, such as SQL injection or command injection, which grants the attacker code execution with root privileges. From there, the attacker can write malware to the filesystem at a location like /tmp/malware.so, then load that malware using dlopen("/tmp/malware.so"). Once loaded, the malware persists in memory or spreads to other containers, resulting in complete compromise of the system.

The second path takes advantage of supply chain vulnerabilities. An attacker injects a shell command into the base image build process. When the container starts, this command executes automatically, installing a backdoor or exfiltrating secrets from environment variables. This results in immediate compromise before the application even begins running.

The third path exploits privilege escalation opportunities. An attacker finds a setuid binary or kernel exploit and escalates their privileges to root. With root access, they can disable audit logging by running rm /var/log/audit.log, covering their tracks completely and enabling long-term, undetectable compromise.

The total number of viable exploitation paths in this scenario is approximately 400, representing every possible way an attacker could gain persistence and control within the container.

Read-Only Only (Read-Only Filesystem + Shell Present)

When the filesystem is read-only but a shell is still available, the attack landscape changes significantly. In the first path, an attacker exploits an application vulnerability and gains code execution. However, when they attempt to write malware to /tmp, this operation fails because the filesystem is read-only. The attacker cannot persist malware, which severely limits their ability to establish long-term control. However, code execution has still occurred, which means the attack is not entirely mitigated.

In the second path, an attacker tries to spawn a shell using system("/bin/sh"), but /bin/sh doesn't exist in the container. Shell execution fails, but the attacker can still compile and execute binaries directly, which provides an alternative path to achieving their goals.

In the third path, information gathering opportunities are limited. The attacker cannot modify the filesystem but can still read it. However, containers typically don't have files like /etc/shadow or readable /var/log directories, making information gathering harder but not impossible.

With a read-only filesystem, the number of viable exploitation paths decreases to approximately 200. The filesystem write capability, which was fundamental to many persistence strategies, has been eliminated.

Shell-Less Only (Writable Filesystem + No Shell)

When the filesystem is writable but no shell is available, a different set of constraints appears. The first path allows an attacker to write to /tmp successfully, since the filesystem remains writable. However, when they try to execute something from /tmp, the absence of a shell complicates matters significantly. The attacker cannot run shell commands, but they could write a compiled binary and execute it directly via dlopen, providing a workaround that bypasses some of the protection.

The second path attempts to establish interactive access by opening a reverse shell. This fails because no shell binary exists. However, an attacker could still execute a compiled reverse shell binary directly, using direct system calls rather than shell commands.

The third path for privilege escalation remains largely intact. The attacker can still find setuid binaries, exploit kernel vulnerabilities, and write a rootkit to the writable filesystem, establishing root-level persistence.

Shell-less containers reduce viable paths to approximately 150. The removal of shell commands blocks one dimension of attacks, but the ability to write and execute binaries directly still enables most classical exploitation techniques.

BOTH: Read-Only + Shell-Less (Immutable + No Shell)

When both read-only filesystems and shell-less containers are combined, the attack landscape becomes fundamentally constrained. In the first path, an attacker exploits an application vulnerability and gains code execution as a non-root user. However, when they attempt to write to /tmp, /var, or any system directory, all of these operations fail because the filesystem is read-only. The attacker cannot write anywhere on the filesystem, which means they cannot persist malware, cache executables, or establish any foothold. Even if they attempt to load a compiled binary via dlopen, they cannot create or write that binary to disk.

In the second path, the block is complete. An attacker cannot spawn a shell because /bin/sh doesn't exist in the image. They cannot open a reverse shell, invoke system commands, or use any shell functionality whatsoever.

In the third path for privilege escalation, the defense becomes highly effective. If an attacker gains code execution as a non-root user, they cannot read /etc/passwd due to permission restrictions. They cannot find and execute setuid binaries because the filesystem is read-only. Even if a kernel exploit exists, they cannot write a rootkit to disk to establish persistence.

In the fourth path for environmental access, the attacker's options are severely limited. They cannot read /etc/shadow due to non-root permissions. They cannot read audit logs or other system files. They can read process memory through /proc, but they only have access to non-root process memory, which significantly reduces the value of any leaked information.

With both controls in place, the number of viable exploitation paths drops to approximately 12. These remaining paths represent inherent risks that cannot be eliminated through container configuration alone: application-level bugs in the running process, direct memory corruption attacks, Kubernetes API server misconfigurations, and kernel vulnerabilities on the host node.

The Security Equation: Mathematically

If we model vulnerability surface as a mathematical function:

Total Attack Surface =  (Filesystem Write Paths × Execution Paths) +  (Shell Commands × Payload Delivery) +  (Privilege Escalation Chains) +  (Information Disclosure Paths)

Then we can calculate the attack surface for each configuration:

Traditional containers have an attack surface of approximately 100 × 100 + 50 × 50 + 50 + 50, which equals roughly 13,600 points. Read-Only Only configurations reduce this to 0 × 100 + 50 × 50 + 50 + 50, which equals approximately 2,600 points, representing an 80% reduction. Shell-Less Only configurations achieve 100 × 0 + 0 × 50 + 50 + 50, which equals roughly 100 points or a 99% reduction in write and execution capabilities. Combined Read-Only and Shell-Less configurations reach 0 × 0 + 0 × 0 + 0 + 12, which equals approximately 12 points, representing a 99% reduction overall.

The combination is not additive; it's multiplicative. When you eliminate two independent dimensions of attack (filesystem writes and shell execution), the attack surface collapses exponentially rather than linearly. This is why combining read-only with shell-less is so powerful.

Part 2: The Complete Kubernetes Security Context

This section provides the definitive, copy-paste, production-ready SecurityContext that implements both read-only and shell-less containerization in a Kubernetes environment.

The Master SecurityContext Block

securityContext:  # Non-root user (65532 is the "nobody" equivalent)  runAsNonRoot: true  runAsUser: 65532  runAsGroup: 65532   # Group for volume ownership (ensures app can read/write mounted volumes)  fsGroup: 65532   # Read-only root filesystem (MUST be true for production)  readOnlyRootFilesystem: true   # Prevent privilege escalation (no setuid, no CAP_SYS_PTRACE)  allowPrivilegeEscalation: false   # Drop all capabilities and don't grant any  capabilities:    drop: ["ALL"]   # Use the default seccomp profile (blocks ~50 dangerous syscalls)  seccompProfile:    type: RuntimeDefault

What Each Field Does

Field	Value	Why
runAsNonRoot	true	Prevents running as root; can't escalate if you're not root
runAsUser	65532	Specific user ID (not root = 0); "nobody" in many systems
runAsGroup	65532	Specific group ID; ensures clean permission model
fsGroup	65532	Makes mounted volumes readable/writable by this group
readOnlyRootFilesystem	true	CRITICAL: Filesystem is immutable; no writes to /, /etc, /bin, etc.
allowPrivilegeEscalation	false	Prevents `setuid` execution and `CAP_SYS_PTRACE`
capabilities.drop	["ALL"]	Removes all Linux capabilities (CAP_NET_RAW, CAP_SYS_ADMIN, etc.)
seccompProfile.type	RuntimeDefault	Applies default seccomp filter (blocks ptrace, reboot, etc.)

Compliance Mapping: CIS, DISA, NIST, FedRAMP

CIS Docker Benchmark v1.7.0

The CIS Docker Benchmark establishes best practices for securing container images and runtime configurations. Section 5.1 covers image and build best practices, requiring containers to be built from known base images. CleanStart images provide hardened, minimal base images that are scanned and verified, satisfying this requirement. Section 5.2 requires a HEALTHCHECK to be configured, which is implemented below in per-application examples. Section 5.3 mandates that layer count be kept as low as possible to reduce the potential attack surface. CleanStart images leverage multi-stage builds to minimize unnecessary layers. Section 5.4 requires that container images be scanned for known vulnerabilities before deployment. CleanStart images are scanned using grype at build time before being pushed to production. Section 5.12 specifies that COPY should be used instead of ADD in Dockerfiles to avoid automatic decompression vulnerabilities. CleanStart images follow this pattern consistently. Section 5.25 addresses runtime security, requiring that containers be restricted from acquiring additional privileges. This is enforced using allowPrivilegeEscalation: false in the SecurityContext. Section 5.26 requires limiting container read access to system process information, achieved through non-read-only filesystems and the seccompProfile: RuntimeDefault configuration. Section 5.27 mandates that container and application logs be configured appropriately, which CleanStart implements by directing all logs to stdout and stderr where they can be accessed through kubectl logs.

DISA STIG for Kubernetes

The Defense Information Systems Agency provides strict security requirements for Kubernetes deployments. Requirement V-242376 mandates that containers must be configured with a read-only root filesystem. This is implemented using readOnlyRootFilesystem: true in the SecurityContext. Requirement V-242377 requires that containers must run as a non-root user, enforced through runAsNonRoot: true and runAsUser: 65532. Requirement V-242378 specifies that containers must not have privileged escalation enabled, which is ensured by allowPrivilegeEscalation: false. Requirement V-242379 requires that all capabilities be dropped from containers, achieved through capabilities.drop: ["ALL"]. Together, these four requirements form the foundation of DISA STIG compliance for container security.

NIST 800-190 (Application Container Security)

The National Institute of Standards and Technology published guidelines for container security that address four key areas. Guideline 4.1 establishes the need to maintain an image repository, which is satisfied through the use of private registries such as Google Container Registry or Amazon ECR. Guideline 4.2 requires performing image scanning for known vulnerabilities before deployment. CleanStart images are scanned, and custom images should be scanned using tools like grype before being pushed to production. Guideline 4.3 requires implementing the principle of least privilege, which is achieved through read-only filesystems, non-root user execution, and dropped capabilities. Guideline 4.4 mandates implementing network isolation, which is addressed through Kubernetes NetworkPolicy objects in a separate guide.

FedRAMP Security Requirements

FedRAMP compliance requires security controls across multiple dimensions. Control AC-6 implements least privilege access, which is satisfied through non-root user execution, dropped capabilities, and read-only filesystems. Control CM-7 implements least functionality, requiring that systems only include necessary components. This is addressed through shell-less images that contain only essential binaries. Control SC-7 manages information in transit and at rest, requiring TLS for network communications and encrypted storage for sensitive data.

Part 3: Per-Application Complete Production Manifests

PostgreSQL: Complete Production Deployment

apiVersion: v1kind: ConfigMapmetadata:  name: postgres-configdata:  postgresql.conf: |    # PostgreSQL Configuration    max_connections = 100    shared_buffers = 256MB    effective_cache_size = 1GB    maintenance_work_mem = 64MB    work_mem = 2621kB    log_statement = 'all'    log_duration = on    log_connections = on    log_disconnections = on---apiVersion: v1kind: Secretmetadata:  name: postgres-secrettype: OpaquestringData:  username: postgres  password: "YOUR_STRONG_PASSWORD_HERE"---apiVersion: v1kind: PersistentVolumeClaimmetadata:  name: postgres-dataspec:  accessModes:    - ReadWriteOnce  storageClassName: standard-rwo  resources:    requests:      storage: 50Gi---apiVersion: apps/v1kind: Deploymentmetadata:  name: postgres  labels:    app: postgresspec:  replicas: 1  selector:    matchLabels:      app: postgres  template:    metadata:      labels:        app: postgres      annotations:        seccomp.security.alpha.kubernetes.io/pod: runtime/default    spec:      securityContext:        runAsNonRoot: true        runAsUser: 65532        runAsGroup: 65532        fsGroup: 65532        readOnlyRootFilesystem: true        allowPrivilegeEscalation: false        capabilities:          drop: ["ALL"]        seccompProfile:          type: RuntimeDefault       containers:      - name: postgres        image: cleanstart/postgresql:15-prod@sha256:abc123def456...        imagePullPolicy: IfNotPresent        ports:        - containerPort: 5432          name: postgres          protocol: TCP         env:        - name: POSTGRES_DB          value: "production"        - name: POSTGRES_USER          valueFrom:            secretKeyRef:              name: postgres-secret              key: username        - name: POSTGRES_PASSWORD          valueFrom:            secretKeyRef:              name: postgres-secret              key: password        - name: POSTGRES_INITDB_ARGS          value: "-c shared_buffers=256MB -c max_connections=100"         volumeMounts:        - name: data          mountPath: /var/lib/postgresql/data        - name: socket          mountPath: /var/run/postgresql        - name: tmp          mountPath: /tmp        - name: shm          mountPath: /dev/shm        - name: config          mountPath: /etc/postgresql          readOnly: true         resources:          requests:            memory: 1Gi            cpu: 500m            ephemeral-storage: 2Gi          limits:            memory: 4Gi            cpu: 2            ephemeral-storage: 5Gi         livenessProbe:          tcpSocket:            port: 5432          initialDelaySeconds: 30          periodSeconds: 10          timeoutSeconds: 5          failureThreshold: 3         readinessProbe:          tcpSocket:            port: 5432          initialDelaySeconds: 10          periodSeconds: 5          timeoutSeconds: 3          failureThreshold: 2         lifecycle:          preStop:            exec:              command: ["/opt/cleanimg/cleanimg-init", "graceful-shutdown"]       volumes:      - name: data        persistentVolumeClaim:          claimName: postgres-data      - name: socket        emptyDir:          sizeLimit: 10Mi      - name: tmp        emptyDir:          sizeLimit: 5Gi      - name: shm        emptyDir:          medium: Memory          sizeLimit: 1Gi      - name: config        configMap:          name: postgres-config       terminationGracePeriodSeconds: 30      dnsPolicy: ClusterFirst      restartPolicy: Always---apiVersion: v1kind: Servicemetadata:  name: postgresspec:  selector:    app: postgres  ports:  - port: 5432    targetPort: 5432    protocol: TCP  type: ClusterIP

The PostgreSQL deployment above demonstrates how to run a stateful database with read-only root filesystems and non-root users. Data persistence is handled through a PersistentVolumeClaim mounted at /var/lib/postgresql/data. Temporary runtime directories like sockets, temporary files, and shared memory use emptyDir volumes to provide writable space without compromising the immutable filesystem. Configuration is mounted from a ConfigMap as read-only, ensuring that the application cannot accidentally modify its own settings. The database runs as a non-root user (65532) with all capabilities dropped, following the security model outlined in Part 2.

Redis: Complete Production Deployment (Cache Mode)

apiVersion: v1kind: ConfigMapmetadata:  name: redis-configdata:  redis.conf: |    maxmemory 2gb    maxmemory-policy allkeys-lru    timeout 0    tcp-keepalive 300    loglevel notice---apiVersion: apps/v1kind: Deploymentmetadata:  name: redis-cache  labels:    app: redis-cachespec:  replicas: 3  strategy:    type: RollingUpdate    rollingUpdate:      maxUnavailable: 1  selector:    matchLabels:      app: redis-cache  template:    metadata:      labels:        app: redis-cache      annotations:        seccomp.security.alpha.kubernetes.io/pod: runtime/default    spec:      securityContext:        runAsNonRoot: true        runAsUser: 65532        runAsGroup: 65532        fsGroup: 65532        readOnlyRootFilesystem: true        allowPrivilegeEscalation: false        capabilities:          drop: ["ALL"]        seccompProfile:          type: RuntimeDefault       containers:      - name: redis        image: cleanstart/redis:7-prod@sha256:abc123def456...        imagePullPolicy: IfNotPresent        ports:        - containerPort: 6379          name: redis          protocol: TCP         volumeMounts:        - name: data          mountPath: /data        - name: tmp          mountPath: /tmp        - name: socket          mountPath: /var/run/redis        - name: config          mountPath: /etc/redis          readOnly: true         resources:          requests:            memory: 512Mi            cpu: 250m            ephemeral-storage: 1Gi          limits:            memory: 2Gi            cpu: 1            ephemeral-storage: 3Gi         livenessProbe:          tcpSocket:            port: 6379          initialDelaySeconds: 10          periodSeconds: 5          timeoutSeconds: 3          failureThreshold: 3         readinessProbe:          tcpSocket:            port: 6379          initialDelaySeconds: 5          periodSeconds: 3          timeoutSeconds: 2          failureThreshold: 2       volumes:      - name: data        emptyDir:          sizeLimit: 10Gi      - name: tmp        emptyDir:          sizeLimit: 1Gi      - name: socket        emptyDir:          sizeLimit: 10Mi      - name: config        configMap:          name: redis-config       terminationGracePeriodSeconds: 10---apiVersion: v1kind: Servicemetadata:  name: redis-cachespec:  selector:    app: redis-cache  ports:  - port: 6379    targetPort: 6379    protocol: TCP  type: ClusterIP

The Redis cache deployment is configured for stateless operation since cache data is ephemeral. Three replicas are deployed for redundancy and load distribution. The read-only root filesystem is maintained while allowing Redis to store cache data in an emptyDir volume. Configuration is provided via ConfigMap and mounted as read-only to prevent runtime modifications. The deployment uses TCP socket probes for health checks rather than shell commands, respecting the shell-less constraint.

Kafka: Complete Production Deployment

A Kafka StatefulSet requires special handling because each broker needs consistent identity and persistent storage. The configuration below shows how to run Kafka with read-only root filesystems and shell-less containers while maintaining the requirements for stateful workloads.

apiVersion: v1kind: ConfigMapmetadata:  name: kafka-configdata:  server.properties: |    broker.rack=rack1    num.network.threads=8    num.io.threads=8    socket.send.buffer.bytes=102400    socket.receive.buffer.bytes=102400    socket.request.max.bytes=104857600    log.dirs=/var/lib/kafka/data    num.partitions=3    num.recovery.threads.per.data.dir=1    offsets.topic.replication.factor=3    transaction.state.log.replication.factor=1    transaction.state.log.min.isr=1---apiVersion: v1kind: PersistentVolumeClaimmetadata:  name: kafka-logsspec:  accessModes:    - ReadWriteOnce  storageClassName: standard-rwo  resources:    requests:      storage: 700Gi---apiVersion: apps/v1kind: StatefulSetmetadata:  name: kafkaspec:  serviceName: kafka  replicas: 1  selector:    matchLabels:      app: kafka  template:    metadata:      labels:        app: kafka      annotations:        seccomp.security.alpha.kubernetes.io/pod: runtime/default    spec:      securityContext:        runAsNonRoot: true        runAsUser: 65532        runAsGroup: 65532        fsGroup: 65532        readOnlyRootFilesystem: true        allowPrivilegeEscalation: false        capabilities:          drop: ["ALL"]        seccompProfile:          type: RuntimeDefault       containers:      - name: kafka        image: cleanstart/kafka:3.5-prod@sha256:abc123def456...        imagePullPolicy: IfNotPresent        ports:        - containerPort: 9092          name: kafka          protocol: TCP         env:        - name: KAFKA_BROKER_ID          valueFrom:            fieldRef:              fieldPath: metadata.name        - name: KAFKA_ADVERTISED_HOSTNAME          valueFrom:            fieldRef:              fieldPath: status.podIP        - name: KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR          value: "1"        - name: KAFKA_LOG_RETENTION_DAYS          value: "7"        - name: KAFKA_LOG_SEGMENT_BYTES          value: "1073741824"         volumeMounts:        - name: logs          mountPath: /var/lib/kafka/data        - name: tmp          mountPath: /tmp        - name: config          mountPath: /etc/kafka          readOnly: true         resources:          requests:            memory: 2Gi            cpu: 1            ephemeral-storage: 1Gi          limits:            memory: 4Gi            cpu: 2            ephemeral-storage: 5Gi         livenessProbe:          tcpSocket:            port: 9092          initialDelaySeconds: 30          periodSeconds: 10          timeoutSeconds: 5          failureThreshold: 3         readinessProbe:          tcpSocket:            port: 9092          initialDelaySeconds: 15          periodSeconds: 5          timeoutSeconds: 3          failureThreshold: 2       volumes:      - name: logs        persistentVolumeClaim:          claimName: kafka-logs      - name: tmp        emptyDir:          sizeLimit: 70Gi      - name: config        configMap:          name: kafka-config       terminationGracePeriodSeconds: 30---apiVersion: v1kind: Servicemetadata:  name: kafkaspec:  clusterIP: None  selector:    app: kafka  ports:  - port: 9092    targetPort: 9092    name: kafka

Kafka brokers maintain state that must persist across restarts, so the StatefulSet uses a PersistentVolumeClaim for the log directory. The StatefulSet naming convention provides consistent pod identities (kafka-0, kafka-1, etc.), which are used to determine the broker ID through the metadata.name field reference. The large ephemeral storage allocation for /tmp (70Gi) accommodates Kafka's temporary file creation needs while remaining on the read-only root filesystem.

Nginx: Complete Production Deployment

Nginx serves as a reverse proxy and load balancer in many architectures. The configuration below demonstrates how to run Nginx with read-only root filesystems while allowing it to cache responses and maintain active connections.

apiVersion: v1kind: ConfigMapmetadata:  name: nginx-configdata:  nginx.conf: |    user www-data www-data;    worker_processes auto;    pid /var/run/nginx/nginx.pid;     events {      worker_connections 1024;      use epoll;    }     http {      include /etc/nginx/mime.types;      default_type application/octet-stream;       log_format main '$remote_addr - $remote_user [$time_local] "$request" '                      '$status $body_bytes_sent "$http_referer" '                      '"$http_user_agent" "$http_x_forwarded_for"';       access_log /dev/stdout main;      error_log /dev/stderr warn;       sendfile on;      tcp_nopush on;      tcp_nodelay on;      keepalive_timeout 65;      types_hash_max_size 2048;      client_max_body_size 20M;       gzip on;      gzip_vary on;      gzip_min_length 1000;      gzip_types text/plain text/css text/xml text/javascript                 application/x-javascript application/xml+rss;       server {        listen 8080 default_server;        listen [::]:8080 default_server;         server_name _;         location / {          proxy_pass http://backend:8000;          proxy_set_header Host $host;          proxy_set_header X-Real-IP $remote_addr;          proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;          proxy_set_header X-Forwarded-Proto $scheme;        }         location /health {          access_log off;          return 200 "ok\n";        }      }    }---apiVersion: apps/v1kind: Deploymentmetadata:  name: nginx  labels:    app: nginxspec:  replicas: 3  selector:    matchLabels:      app: nginx  strategy:    type: RollingUpdate    rollingUpdate:      maxUnavailable: 1  template:    metadata:      labels:        app: nginx      annotations:        seccomp.security.alpha.kubernetes.io/pod: runtime/default    spec:      securityContext:        runAsNonRoot: true        runAsUser: 65532        runAsGroup: 65532        fsGroup: 65532        readOnlyRootFilesystem: true        allowPrivilegeEscalation: false        capabilities:          drop: ["ALL"]        seccompProfile:          type: RuntimeDefault       containers:      - name: nginx        image: cleanstart/nginx:1.25-prod@sha256:abc123def456...        imagePullPolicy: IfNotPresent        ports:        - containerPort: 8080          name: http          protocol: TCP         volumeMounts:        - name: config          mountPath: /etc/nginx/nginx.conf          subPath: nginx.conf          readOnly: true        - name: cache          mountPath: /var/cache/nginx        - name: run          mountPath: /var/run/nginx        - name: tmp          mountPath: /tmp         resources:          requests:            memory: 256Mi            cpu: 100m            ephemeral-storage: 500Mi          limits:            memory: 512Mi            cpu: 500m            ephemeral-storage: 2Gi         livenessProbe:          httpGet:            path: /health            port: 8080            scheme: HTTP          initialDelaySeconds: 5          periodSeconds: 5          timeoutSeconds: 2          failureThreshold: 3         readinessProbe:          httpGet:            path: /health            port: 8080            scheme: HTTP          initialDelaySeconds: 2          periodSeconds: 3          timeoutSeconds: 1          failureThreshold: 2       volumes:      - name: config        configMap:          name: nginx-config      - name: cache        emptyDir:          sizeLimit: 5Gi      - name: run        emptyDir:          sizeLimit: 10Mi      - name: tmp        emptyDir:          sizeLimit: 100Mi       affinity:        podAntiAffinity:          preferredDuringSchedulingIgnoredDuringExecution:          - weight: 100            podAffinityTerm:              labelSelector:                matchExpressions:                - key: app                  operator: In                  values:                  - nginx              topologyKey: kubernetes.io/hostname       terminationGracePeriodSeconds: 10---apiVersion: v1kind: Servicemetadata:  name: nginxspec:  selector:    app: nginx  ports:  - port: 80    targetPort: 8080    protocol: TCP  type: LoadBalancer

The Nginx configuration logs directly to stdout and stderr rather than writing to disk, which respects the read-only root filesystem constraint. Cache directories are mounted as ephemeral emptyDir volumes, allowing Nginx to maintain performance through caching without compromising the immutable root filesystem. The deployment includes pod anti-affinity rules to distribute Nginx instances across different nodes, improving availability and resilience.

Python Web App: Complete Production Deployment

Python web applications often require database migrations and other initialization steps before the main application can start. The configuration below shows how to handle init containers while maintaining security constraints throughout the deployment lifecycle.

apiVersion: v1kind: ConfigMapmetadata:  name: app-configdata:  app.env: |    ENVIRONMENT=production    LOG_LEVEL=info    DEBUG=false---apiVersion: apps/v1kind: Deploymentmetadata:  name: python-app  labels:    app: python-appspec:  replicas: 3  selector:    matchLabels:      app: python-app  strategy:    type: RollingUpdate    rollingUpdate:      maxUnavailable: 1      maxSurge: 1  template:    metadata:      labels:        app: python-app      annotations:        seccomp.security.alpha.kubernetes.io/pod: runtime/default    spec:      securityContext:        runAsNonRoot: true        runAsUser: 65532        runAsGroup: 65532        fsGroup: 65532        readOnlyRootFilesystem: true        allowPrivilegeEscalation: false        capabilities:          drop: ["ALL"]        seccompProfile:          type: RuntimeDefault       initContainers:      - name: migrate        image: myregistry/python-app:1.0.0@sha256:abc123def456...        imagePullPolicy: IfNotPresent        securityContext:          runAsUser: 65532          readOnlyRootFilesystem: true          allowPrivilegeEscalation: false          capabilities:            drop: ["ALL"]        command:        - python        args:        - /app/manage.py        - migrate        - --no-input        env:        - name: DATABASE_URL          valueFrom:            secretKeyRef:              name: app-secrets              key: database-url        - name: DJANGO_SETTINGS_MODULE          value: "config.settings.production"        volumeMounts:        - name: tmp          mountPath: /tmp        resources:          requests:            memory: 256Mi            cpu: 250m            ephemeral-storage: 500Mi          limits:            memory: 512Mi            cpu: 1            ephemeral-storage: 1Gi       containers:      - name: app        image: myregistry/python-app:1.0.0@sha256:abc123def456...        imagePullPolicy: IfNotPresent        ports:        - containerPort: 8000          name: http          protocol: TCP         env:        - name: ENVIRONMENT          value: "production"        - name: DATABASE_URL          valueFrom:            secretKeyRef:              name: app-secrets              key: database-url        - name: REDIS_URL          value: "redis://redis-cache:6379/0"        - name: SECRET_KEY          valueFrom:            secretKeyRef:              name: app-secrets              key: secret-key        - name: DJANGO_SETTINGS_MODULE          value: "config.settings.production"         volumeMounts:        - name: tmp          mountPath: /tmp        - name: cache          mountPath: /app/cache        - name: config          mountPath: /etc/app          readOnly: true         resources:          requests:            memory: 512Mi            cpu: 250m            ephemeral-storage: 500Mi          limits:            memory: 1Gi            cpu: 1            ephemeral-storage: 2Gi         livenessProbe:          httpGet:            path: /health/live            port: 8000            scheme: HTTP          initialDelaySeconds: 15          periodSeconds: 10          timeoutSeconds: 2          failureThreshold: 3         readinessProbe:          httpGet:            path: /health/ready            port: 8000            scheme: HTTP          initialDelaySeconds: 5          periodSeconds: 3          timeoutSeconds: 2          failureThreshold: 2       volumes:      - name: tmp        emptyDir:          sizeLimit: 1Gi      - name: cache        emptyDir:          sizeLimit: 500Mi      - name: config        configMap:          name: app-config       affinity:        podAntiAffinity:          preferredDuringSchedulingIgnoredDuringExecution:          - weight: 100            podAffinityTerm:              labelSelector:                matchExpressions:                - key: app                  operator: In                  values:                  - python-app              topologyKey: kubernetes.io/hostname       terminationGracePeriodSeconds: 30---apiVersion: v1kind: Servicemetadata:  name: python-appspec:  selector:    app: python-app  ports:  - port: 8000    targetPort: 8000    protocol: TCP  type: ClusterIP

The Python application deployment uses an initContainer to run database migrations before the main application starts. This ensures that the database schema is up-to-date without requiring manual intervention. The initContainer runs with the same security constraints as the main container, enforcing the security model from initialization through runtime. The main application container uses HTTP health checks to verify liveness and readiness, enabling Kubernetes to automatically restart failed instances and route traffic only to healthy replicas.

Part 4: Verification and Compliance Checks

Checklist: Is My Manifest Production-Ready?

The following comprehensive checklist guides you through verifying that your Kubernetes manifests implement all required security controls. Start with the SecurityContext section and confirm that runAsNonRoot is set to true. Check that runAsUser is set to a specific non-root ID like 65532, and verify that runAsGroup matches this value. Ensure that fsGroup is set to the same value for proper volume permission handling. Confirm that readOnlyRootFilesystem is set to true, making the filesystem immutable. Check that allowPrivilegeEscalation is set to false to prevent privilege escalation attacks. Verify that capabilities.drop is set to ["ALL"] to remove all Linux capabilities. Confirm that seccompProfile.type is set to RuntimeDefault to apply syscall filtering.

Review your storage configuration to ensure that persistent data uses PersistentVolumeClaims for long-term storage needs. Verify that temporary data uses emptyDir for transient storage that's lost when the pod terminates. Confirm that sensitive temporary data uses emptyDir with medium: Memory for tmpfs mounting to keep data in RAM only. Check that configuration uses ConfigMap mounts with readOnly: true to prevent modification. Verify that secrets use Secret mounts with readOnly: true. Ensure that no writable system directories are mounted into the container.

Examine your ENTRYPOINT and CMD instructions to confirm that ENTRYPOINT uses exec form (array notation) rather than shell form. Verify that your ENTRYPOINT or shell script doesn't invoke /bin/sh. Check that complex initialization uses cleanimg-init.toml for declarative setup. Verify that database migrations and setup tasks use initContainers rather than embedding them in the main container logic.

Confirm that livenessProbe is defined for your container to detect and restart failed processes. Verify that readinessProbe is defined to control traffic routing. Ensure that probes use HTTP, TCP, or direct cleanimg-init endpoints rather than shell exec probes. Review the initialDelaySeconds values to ensure they're appropriate (30+ seconds for databases, 10+ for applications). Verify that probes have reasonable timeout values (2-5 seconds).

Check that resources.requests.memory is set to an appropriate value based on expected memory usage. Verify that resources.requests.cpu is set based on expected computational load. Ensure that resources.requests.ephemeral-storage is set to prevent disk exhaustion. Confirm that resources.limits.memory is set, typically 2-4 times the request value. Verify that resources.limits.cpu is set based on maximum acceptable usage. Check that resources.limits.ephemeral-storage is set to 2-3 times the request value.

Review capabilities to ensure NET_RAW is dropped if not needed for your workload. Verify that NetworkPolicy is implemented if multiple applications share the cluster. Confirm that internal services use ClusterIP rather than exposing on public IPs.

Examine image references to confirm they come from cleanstart/* or are built from cleanstart base images. Verify that image digests are pinned using SHA256 hashes rather than tags to prevent unexpected image updates. Confirm that images have been scanned for vulnerabilities before being pushed to the registry. Review image signature verification settings in your admission webhook configuration.

Verify that application logs are written to stdout and stderr where they can be collected by Kubernetes. Check that log level is set to info or warn (not debug in production) to reduce noise and security risks. Review application code to ensure sensitive data like passwords and API keys are never logged.

Audit Command (kubectl)

The following kubectl commands help you audit your cluster to identify non-compliant deployments. To find all Deployments that don't have a read-only root filesystem, execute:

kubectl get deployments -A -o json | \  jq -r '.items[] | select(.spec.template.spec.securityContext.readOnlyRootFilesystem != true) | .metadata.name'

To identify Deployments that don't enforce non-root execution, run:

kubectl get deployments -A -o json | \  jq -r '.items[] | select(.spec.template.spec.securityContext.runAsNonRoot != true) | .metadata.name'

To find containers that haven't dropped all capabilities, execute:

kubectl get pods -A -o json | \  jq -r '.items[] | select(.spec.containers[].securityContext.capabilities.drop[] != "ALL") | .metadata.name'

To discover pods using images that likely contain shells (common base images like ubuntu and debian), run:

kubectl get pods -A -o json | \  jq -r '.items[] | select(.spec.containers[].image | startswith("ubuntu") or startswith("debian")) | .metadata.name'

Summary: The Production Security Model

Property	Status	Benefit
Read-only filesystem	✅ Enabled	80% attack surface reduction (no writes)
Shell-less image	✅ Enabled	99% reduction in command-injection attacks
Non-root user	✅ Enforced	Can't escalate to root if not running as root
Dropped capabilities	✅ Enabled	50+ Linux capabilities removed
Immutable configuration	✅ ConfigMap RO mounts	Can't modify app behavior at runtime
Health checks	✅ Defined	Automatic restart on failure
Signal handling	✅ cleanimg-init or direct app	Graceful shutdown, no orphaned processes
Ephemeral storage isolation	✅ emptyDir + tmpfs	No persistence of attacker payloads
Persistent storage isolation	✅ PVC + fsGroup	Access control, encryption at rest

This combination of read-only filesystems, shell-less containers, non-root user execution, dropped capabilities, and proper health checking represents the modern production baseline for Kubernetes deployments. Every application deployed to Kubernetes in a production environment should implement these security controls as a foundational layer of defense. The controls are not optional additions but rather essential elements of responsible cloud-native application deployment.