What a Container Runtime Actually Does
A container runtime is not a single piece of software. It is a chain of tools working together to transform a container image into a running process isolated from the host.
When you run docker run ubuntu sleep 1000 or podman pull alpine, you are invoking a container runtime. The runtime's job is mechanical but involves several distinct responsibilities:
graph TB Image["Container Image"] --> DL["Download & Verify"] DL --> Unpack["Unpack Layers"] Unpack --> NS["Create Namespaces<br/>pid, network, mount<br/>UTS, IPC"] NS --> CG["Configure Cgroups<br/>CPU, memory<br/>block I/O limits"] CG --> FS["Prepare Root<br/>Filesystem"] FS --> Exec["Execute Process<br/>Environment vars<br/>Capabilities<br/>Seccomp"] Exec --> Monitor["Lifecycle<br/>Management"] Monitor --> Cleanup["Cleanup &<br/>Teardown"] style Image fill:#e3f2fd style Exec fill:#fff9c4 style Monitor fill:#c8e6c9The container runtime begins by handling image management, which encompasses downloading container images from registries, verifying their signatures, and storing them locally for future use. Once an image is available, the runtime must unpack it, extracting image layers from compressed archives into a readable filesystem structure that can be used by the container. The runtime then creates namespaces, setting up isolated pid, network, mount, UTS, and IPC namespaces so the container is isolated from the host and other containers. Complementing namespace isolation, the runtime configures cgroups to enforce resource limits on CPU, memory, and block I/O, ensuring the container does not consume unlimited resources. The root filesystem preparation is a critical step where the runtime constructs a root filesystem for the container from image layers and any additional mounts specified in the configuration. With the filesystem ready, the runtime handles process execution, starting the container's main process with the correct environment variables, capabilities, and seccomp filters as defined in the image configuration. Once the container is running, lifecycle management becomes important—the runtime monitors the process, handles signals like SIGTERM, and reports status to the orchestrator. Finally, when the container stops, the runtime performs cleanup, tearing down namespaces, removing mounts, and freeing resources back to the host.
This entire chain involves multiple programs and protocols. Modern container runtimes are not monolithic—they are modular. Docker, the tool you interact with, is actually orchestrating multiple pieces: the Docker daemon, containerd, runc, and the kernel. Each component specializes in a particular aspect of container management, creating a flexible and extensible system.
The Two-Level Architecture: High-Level and Low-Level Runtimes
Container runtimes are split into two layers, each with distinct responsibilities. This separation of concerns creates a flexible architecture where different components can be developed, tested, and upgraded independently.
The high-level runtime, also called the daemon or service layer, manages image storage and lifecycle, handling the complexity of maintaining a local cache of container images. It implements registry APIs so you can pull images from Docker Hub, Quay, or private registries. The high-level runtime provides network and storage abstractions, creating virtual networks and mount points for containers. It exposes an API, typically using gRPC for efficient communication, allowing clients to interact with the runtime. Importantly, the high-level runtime does NOT directly create containers itself—that responsibility belongs to the low-level runtime. Examples of high-level runtimes include containerd, CRI-O, and podman.
The low-level runtime, also known as the OCI runtime, takes a prepared bundle from the high-level runtime and actually creates the container. This bundle contains the filesystem and configuration needed for the container. The low-level runtime creates the namespaces and cgroups, starts the container process with the correct isolation and resource limits, and implements the OCI Runtime Specification—the standard interface that all OCI-compliant runtimes must follow. The low-level runtime runs as a short-lived process—it starts the container and then exits, leaving the container running. Examples include runc, crun, youki, Kata, and gVisor.
The separation is intentional and powerful. The high-level runtime can be upgraded without touching the kernel interaction logic. The low-level runtime can be swapped for security alternatives like gVisor (which sandboxes containers) or performance optimizations like crun (which is written in C for faster startup) without changing the high-level orchestration layer.
The architecture flows cleanly from end to end: the user or container orchestrator sends a request through either the CRI (Container Runtime Interface) or Docker API to the high-level runtime. The high-level runtime handles image management, network setup, and storage management, preparing everything the container needs. It then communicates with the low-level runtime through the OCI Runtime Specification, passing the prepared bundle. The low-level runtime is responsible for namespace creation, cgroup setup, and process execution, ultimately interfacing with the Linux kernel to execute the container. This layered design means you can optimize each layer independently for performance, security, or other requirements.
containerd: Docker's Extracted Core
containerd is a high-level container runtime that was extracted from Docker. In 2017, Docker made a strategic decision to disassemble itself, pulling out the core container management logic into a standalone project: containerd. This separation became essential as containerization matured and different projects needed runtime capabilities independent of Docker's broader tooling.
Why this matters fundamentally: Docker is no longer a monolithic binary. Docker-the-tool is now a thin client that talks to the containerd daemon over gRPC. The containerd daemon does the actual heavy lifting of managing containers. This separation allows Docker the tool to be lightweight while containerd handles all the complexity of image management, container lifecycle, and runtime integration.
containerd is language-agnostic, written in Go for performance and portability, and vendor-neutral, hosted at the CNCF, which means it is governed by the community rather than a single company. Because of these properties, it is widely adopted across the industry by Docker (as its core runtime), Kubernetes (as an optional runtime), Google Cloud Run, AWS Lambda, and most managed Kubernetes services. This wide adoption means that understanding containerd is understanding what powers production container infrastructure globally.
containerd includes a comprehensive set of capabilities. It handles image pulling and layer management, downloading images from registries and maintaining a local cache. It decompresses images and verifies their signatures, ensuring image integrity. A gRPC API provides container operations for clients, making containerd accessible to any tool that speaks the protocol. Network namespace management is built in, allowing containers to have their own network stacks. Storage snapshots implement copy-on-write filesystems so multiple containers can efficiently share base image layers. Content addressable storage means images are identified by SHA256 hashes, guaranteeing that the same image always produces the same hash. Finally, containerd integrates seamlessly with runc or other OCI runtimes, delegating the actual low-level container creation to these specialized tools.
However, containerd deliberately does NOT include several things that Docker does. It provides no UI—there is no CLI as user-friendly as Docker's, no GUI, no Docker Desktop experience. It does not include Compose or orchestration logic for multi-container applications. It provides no daemon management for non-Kubernetes systems, as it was designed with Kubernetes as the primary consumer. High-level networking abstractions are beyond its scope; while it creates network namespaces for containers, orchestration systems like Kubernetes handle bridge setup and service discovery.
Using containerd directly:
# Pull an imagectr images pull docker.io/library/nginx:latest # Run a containerctr run docker.io/library/nginx:latest my-container # List containersctr containers list # Stop a containerctr tasks kill my-containerThe ctr CLI is minimal and command-based (similar to git). It's not as user-friendly as Docker, but it's powerful and direct.
In production use, many Kubernetes clusters run containerd without Docker at all. The kubelet talks directly to containerd via the CRI (Container Runtime Interface). This design choice reduces overhead significantly: there is no Docker daemon running, no extra layer of abstraction between kubelet and container operations. This direct integration is why containerd became the default runtime for Kubernetes and why most managed Kubernetes services have adopted it.
CRI-O: Built for Kubernetes
CRI-O is a high-level runtime purpose-built for Kubernetes. It exists for one reason: to implement the CRI specification as minimally as possible, nothing more. Unlike containerd, which aims to be a general-purpose runtime that can be used in various contexts, CRI-O takes a different philosophy.
CRI-O does not try to be Docker. It does not attempt to be a general-purpose container runtime that can be deployed in diverse environments. It is laser-focused on a single mission: provide Kubernetes with a container runtime that is fast, secure, and minimal.
CRI-O includes the essential components for Kubernetes to manage containers. It pulls images via standard registries, ensuring containers can be deployed from Docker Hub, private registries, or other standard sources. It executes containers in an OCI-compliant manner, meaning it follows the standard container specification. It implements the CRI gRPC API that Kubernetes requires, allowing the kubelet to communicate with the runtime. Networking is handled minimally—just namespace creation, as Kubernetes itself handles the bridge setup and service discovery. Storage is similarly minimal, focusing only on image management without attempting higher-level abstractions.
CRI-O deliberately does NOT include several features that would make it more general-purpose. It provides no general-purpose CLI because it is not meant to be used interactively by humans—kubelet controls it programmatically. Compose or orchestration logic is absent because Kubernetes handles that. There is no Docker API compatibility, as CRI-O makes no attempt to be a Docker drop-in replacement. Volume management is completely delegated to Kubernetes, which handles the complexity of persistent storage. Network bridge configuration is similarly delegated to Kubernetes, which orchestrates networking at the cluster level.
CRI-O's primary advantage is its minimalism. It does one thing and does it exceptionally well. A CRI-O installation is smaller, simpler, and has fewer moving parts than a full Docker installation, reducing the attack surface and operational complexity. The trade-off is that CRI-O's specialization means it is less flexible for other uses. If you want to run containers on a single machine without Kubernetes, containerd is more adaptable. CRI-O is designed exclusively for Kubernetes integration, making it unsuitable for standalone use.
Red Hat and OpenShift have chosen CRI-O as their default runtime, reflecting its focus on enterprise Kubernetes deployments. The Kata project uses CRI-O for VM-based container execution, where each container runs in its own lightweight virtual machine.
runc: The Reference OCI Runtime
runc is the low-level runtime—the program that actually creates containers. It is the reference implementation of the OCI Runtime Specification, meaning every other OCI runtime is measured against runc's compliance with the standard.
runc is written in Go and compiled as a static binary, which means it has no external dependencies and can run on any Linux system. When containerd or CRI-O decide it's time to start a container, they call runc with a prepared bundle. This bundle contains two critical things: config.json with the OCI runtime configuration (which defines namespaces, cgroups, capabilities, seccomp filters, and mounts), and rootfs/, the extracted root filesystem from the image layers. runc takes this bundle and transforms it into a running container, managing all the kernel interactions.
Typical runc invocation (done by containerd, not you) involves three commands: runc create sets up the container and its resources without starting it, runc start actually begins executing the container's main process, and runc delete cleans up after the container stops. These commands form the lifecycle of container execution.
Why runc matters cannot be overstated. It is battle-tested—billions of containers have run with runc across the world, from personal development machines to massive cloud providers. This long track record gives confidence in its stability. It is the standard reference that all other OCI runtimes implement against, meaning if something is compatible with runc, it is compatible with the OCI standard. runc is relatively secure; when security issues are discovered, they are fixed quickly by the maintainers. It is also remarkably simple—runc is about 2,500 lines of code, which is quite small considering what it accomplishes. This simplicity makes it easier to audit and maintain.
However, runc vulnerabilities are critical precisely because every container using runc is affected by them. Real examples illustrate this impact. CVE-2019-5736 was a runc escape vulnerability via /proc/self/fd symlink manipulation, allowing containers to break out. CVE-2021-30465 was another escape via mount namespaces, and CVE-2022-29162 exploited a /proc symlink race condition. Each vulnerability was patched, but these examples demonstrate why runc matters so much. A vulnerability in runc is not just a vulnerability in one container—it is a vulnerability in every container on the system using runc. This is why runc security updates must be deployed as soon as they are available.
Alternative Low-Level Runtimes: Performance, Security, and Specialization
Beyond runc, alternatives exist for specific needs. Each makes different trade-offs in terms of performance, security, complexity, and isolation strength.
crun is a C implementation of the OCI runtime specification, providing an alternative to runc's Go implementation. Written in C for maximum performance, crun achieves 30-40% faster container startup compared to runc, making it ideal when startup latency is critical. It also has a lower memory footprint, which matters in high-density environments. Red Hat maintains crun as part of its container runtime strategy, and critically, it is backward-compatible with runc's API, meaning existing orchestration systems can swap crun in place of runc without code changes. The primary use case for crun is high-performance, high-density container environments such as edge computing and serverless platforms where startup latency and memory efficiency directly impact scalability and user experience.
youki is a Rust implementation of the OCI runtime specification. Rust is a memory-safe language that eliminates entire classes of C vulnerabilities such as buffer overflows and use-after-free bugs that have plagued C-based software for decades. However, youki is newer and has a developing community; while it is maturing quickly, it is not as battle-tested as runc. youki has slightly higher startup overhead than crun but is still lower than runc. The primary use case for youki is security-focused environments where memory safety is a priority, such as aerospace, critical infrastructure, or research environments where formal verification of safety properties is valuable.
gVisor is a sandboxed runtime that takes a different approach to isolation. Rather than relying on namespaces and cgroups alone, gVisor implements the OCI spec but intercepts all syscalls from the container and executes them in a sandboxed environment isolated from the host kernel. This means a container process never directly calls the host kernel—every syscall goes through gVisor's sandbox layer. The critical benefit is that no container escape can compromise the host kernel because containers cannot directly access it. The trade-off is a 5-10% performance overhead from the syscall interception. The use case for gVisor is multi-tenant environments where you run untrusted code from different sources, or anywhere maximum isolation is needed despite the performance cost.
Kata Containers is a VM-based runtime that launches a lightweight virtual machine for each container. Each container has its own kernel, making it impossible for one container to affect another because they have completely separate kernels. This provides the strongest isolation possible, but at a cost: each Kata container consumes 100-500 MB of memory for its VM, and startup times are measured in seconds rather than milliseconds. The use case for Kata is multi-tenant cloud environments with strong isolation requirements, hosting provider scenarios where you want to protect other customers' workloads, or hybrid deployments where you need both container efficiency and VM-level isolation.
The selection matrix shows how to choose: if performance is the priority, crun is the answer. If memory safety is critical for compliance or security reasons, youki is preferred. If you are running multi-tenant systems with untrusted code, gVisor provides the necessary isolation. If you need the strongest possible isolation, Kata Containers is worth the overhead. For everything else—standard enterprise deployments—runc is the battle-tested default.
Kubernetes CRI: How Orchestration Talks to Runtimes
Kubernetes does not directly call runc or containerd. Instead, Kubernetes defines an interface that any runtime must implement: the Container Runtime Interface (CRI). This abstraction is critical to Kubernetes's flexibility and extensibility.
The CRI is a gRPC API that abstracts the fundamental operations a container runtime must support. Kubernetes's kubelet component communicates with container runtimes exclusively via CRI, never making direct calls to runc, containerd, or any other runtime.
The CRI operations define the contract between Kubernetes and a runtime. PullImage(image) downloads and stores a container image from a registry. RunPodSandbox(config) creates a network namespace for a pod, the virtual network interface that containers in the pod share. CreateContainer(config) creates a container within a sandbox but does not start it, allowing Kubernetes to configure it further. StartContainer(id) actually begins executing a previously created container. StopContainer(id) stops a running container gracefully, allowing it to shut down cleanly. RemoveContainer(id) deletes a container and its resources. ListContainers() returns all running containers so Kubernetes can determine the current state. ListPodSandbox() gets the state of all pod sandboxes. ContainerStats() returns resource usage metrics so Kubernetes can make scheduling decisions.
The elegance of this interface is that any runtime implementing the CRI can be used with Kubernetes. You can swap runtimes without changing Kubernetes itself. This is why Kubernetes can support containerd, CRI-O, Docker (via an adapter), and other runtimes interchangeably.
Different runtimes implement CRI in different ways. containerd has a CRI plugin built-in, meaning it natively speaks the CRI protocol. CRI-O implements CRI directly, as it was designed specifically to implement CRI minimally. Docker does not implement CRI natively, so a shim called cri-dockerd translates CRI calls to Docker API calls. Similarly, containerd uses containerd-shim-runc-v2, a shim that manages runc processes and translates between the CRI protocol and runc's interface.
In practice, the kubelet connects to a CRI endpoint, which is usually a Unix socket like /run/containerd/containerd.sock or /run/crio/crio.sock. The kubelet sends gRPC messages to this socket, and the runtime receives the message and executes the requested operation. The response is sent back over the socket, allowing kubelet to monitor and manage the container throughout its lifecycle.
The Full Stack: From Kubernetes to Kernel
An end-to-end container launch in Kubernetes involves a choreographed sequence of components, each doing its part before passing control to the next layer. Understanding this flow illuminates why the layered architecture is so powerful.
The container startup flow begins when a user applies a pod specification via kubectl or other API methods. The Kubernetes API Server receives the specification and schedules the pod on an available node based on resource requirements, affinity rules, and current cluster capacity. The kubelet on the selected node receives the pod assignment and understands it needs to make that pod's specification reality. It calls the CRI endpoint at /cri.v1.RuntimeService/RunPodSandbox, which is typically a Unix socket connection to containerd or CRI-O. The containerd CRI plugin receives this request and creates the network namespace where the pod's containers will live and communicate. It then calls runc with the prepared bundle for the first container. The runc process takes the bundle and creates the pid, mount, ipc, and uts namespaces, sets up cgroups for resource limits, and invokes the unshare() syscall to actually create these namespaces. The kernel receives these syscalls and creates the actual namespaces and cgroups in kernel memory, then starts the container process with PID 1 in the new namespace. The application in the container begins its startup sequence, reading its configuration and initializing itself. Finally, the CRI returns the container status to kubelet, which observes the container running and marks the pod as Ready, making it available for traffic.
This entire chain happens in milliseconds. The indirection—Kubernetes → CRI → containerd → runc → kernel—may seem complex on the surface, but it provides crucial flexibility: you can swap any component without affecting others. Upgrade containerd without changing Kubernetes. Replace runc with crun without changing containerd. Change from Kubernetes to another orchestrator without changing containerd. This modularity is by design and is one of the reasons containerization has become so pervasive.
How to Check Your Kubernetes Runtime
Discovering which container runtime your Kubernetes cluster uses is straightforward. Different commands are available depending on your setup and what information you need.
If you are using containerd, you can check by running ps aux | grep containerd, which shows the containerd daemon running. You can also run kubectl get nodes -o wide, which displays the runtime column showing "containerd". To check the version, use containerd --version.
If you are using CRI-O, check by running ps aux | grep crio, which shows the CRI-O daemon. The version can be checked with crio --version.
If you are using Docker, check by running ps aux | grep docker, which shows the Docker daemon running. Since Kubernetes does not directly use Docker via the Docker API, you should verify that Kubernetes uses the cri-dockerd shim, which translates CRI calls to Docker API. You can check this by running ps aux | grep cri-dockerd.
For a universal check that works across any setup, use the kubelet configuration. Run kubectl describe node node-name | grep Runtime, which shows the runtime type detected by kubectl. Alternatively, examine the kubelet configuration directly with cat /etc/kubernetes/kubelet.conf | grep runtimeEndpoint. This shows the socket path—typically /run/containerd/containerd.sock for containerd or /run/crio/crio.sock for CRI-O—which tells you exactly which runtime the kubelet is configured to use.
Runtime Performance Characteristics
Startup and memory usage vary by runtime:
Runtime | Startup Time | Memory Overhead | Isolation | Use Case |
|---|---|---|---|---|
runc | 100-200ms | 1-2 MB | Namespace-based | Standard, production |
crun | 50-100ms | 0.5-1 MB | Namespace-based | Performance-critical |
youki | 120-180ms | 1-1.5 MB | Namespace-based | Security-critical |
gVisor | 200-400ms | 20-50 MB | Sandboxed | Multi-tenant |
Kata | 1-3 seconds | 100-500 MB | VM-based | Untrusted code |
Measurements vary based on image size, kernel version, and host hardware.
Selecting a runtime is a trade-off between performance, security, and isolation strength.
Runtime Security Implications
Different runtimes provide different security guarantees, and understanding these trade-offs is essential for threat modeling and risk management.
Namespace-based runtimes like runc, crun, and youki provide isolation that is kernel-enforced but not cryptographically strong. A kernel vulnerability affects all containers on the host because they share the same kernel. Since containers can access the same kernel, there is potential for privilege escalation if a kernel vulnerability is exploited. These runtimes are fast and efficient, which is why they are the norm in production deployments. However, they require defense in depth: you must add seccomp filters to disable dangerous syscalls, drop unnecessary capabilities to prevent privilege escalation, use AppArmor or SELinux for additional mandatory access controls, and implement other security measures.
Sandboxed runtimes like gVisor intercept all syscalls, preventing direct kernel access. Because containers do not call the kernel directly—every syscall is intercepted—a kernel vulnerability cannot escape through the syscall interface. One container also cannot perform denial-of-service attacks on another container by exhausting shared kernel resources. The cost is a 5-10% performance overhead from syscall interception. While gVisor is still susceptible to bugs in the gVisor code itself, the gVisor code base is much smaller than the Linux kernel, reducing the overall attack surface.
VM-based runtimes like Kata Containers give each container its own kernel, so one container's kernel vulnerability does not affect others. This provides strong isolation guarantees. The trade-off is high overhead—each container requires its own VM with its own kernel and memory. AWS Firecracker and similar projects use VM-based approaches for multi-tenant cloud environments.
A defense-in-depth approach combines multiple layers. For standard deployments, use namespace-based runtimes for their performance characteristics, then harden them with seccomp filters, capability dropping, AppArmor/SELinux policies, and non-root users. For multi-tenant scenarios where you run untrusted code or need maximum isolation, consider gVisor for the sandbox overhead or Kata for the strongest isolation.
Swapping Runtimes: When and Why
You might change runtimes for several reasons, depending on your operational requirements and constraints.
For performance reasons, you might switch from runc to crun if startup time or memory consumption is critical for your workload. This is especially relevant for edge devices where startup latency impacts user experience or for serverless environments where memory efficiency directly affects cost and density.
For security reasons, you might switch from runc to youki if memory safety is a compliance requirement in your organization. Aerospace, critical infrastructure, and defense applications often have formal requirements for memory-safe code without undefined behavior.
For isolation reasons, you might switch from namespace-based runtimes to gVisor if you are running untrusted code. Platforms-as-a-service and marketplace environments where customers submit arbitrary code benefit from gVisor's syscall interception preventing container escapes.
For hardware reasons, you might switch from runc to Kata if you need per-container VM isolation. Hosting providers managing multi-tenant infrastructure often choose Kata to provide strong isolation guarantees to customers, even if it means accepting the overhead.
Example: Switching Kubernetes to crun
- Install crun on all nodes: sudo apt-get install crun # Debian/Ubuntu
- Update containerd config: /etc/containerd/config.toml [plugins."io.containerd.grpc.v1.cri".containerd] default_runtime_name = "crun"
- Restart containerd: sudo systemctl restart containerd
- New containers use crun; existing containers are unaffected.
Common Runtime Issues
Understanding common runtime issues and their solutions helps you debug problems quickly and maintain stable container infrastructure.
High container startup latency can stem from several sources. If image pull is slow, you can use registry caching or local mirrors to improve pull speeds. If runc startup itself is slow and performance is critical, switching to crun can reduce startup time by 30-40%. If cgroup setup overhead is the bottleneck, you can profile with trace-cmd to identify where time is spent and optimize accordingly.
Out of memory errors in containers can have multiple causes. If the cgroup memory limit is too low, increase the memory limit in the container's resource requests. If there is a memory leak in the application, either fix the application or set OOMKillDisable=true in the Kubernetes pod spec. If you are using gVisor and seeing higher memory overhead, remember that gVisor's syscall interception has memory costs—consider switching to runc if memory usage is critical.
When containers cannot access devices, first verify that the device is whitelisted in the cgroup rules. If a capability is needed, add it explicitly (CAP_SYS_ADMIN is required for many devices). If SELinux or AppArmor is blocking access, audit the policies and adjust them to allow the necessary operations.
CRI socket permission denied errors occur when a user lacks permission to access the socket. The kubelet typically runs as root and has access to /run/containerd/containerd.sock or /run/crio/crio.sock, but if you are trying to access the runtime socket directly, the user must be in the docker or containerd group, or you must run the command with sudo.
Architecture Decision: Which Runtime for Your Cluster
Choosing a container runtime requires evaluating multiple factors about your environment, team, and requirements. There is no universal answer, but there are guidelines for different scenarios.
The default runtime in your Kubernetes distribution is often the best place to start. kubeadm has used containerd as the default since v1.24. Managed Kubernetes services (EKS, GKE, AKS) use provider-specific defaults, though most have settled on containerd. OpenShift uses CRI-O as its default, reflecting Red Hat's commitment to CRI-O's minimalist philosophy. Rancher typically defaults to containerd or CRI-O depending on the distribution.
Team familiarity matters significantly. If most of your team knows Docker, starting with Docker or containerd makes onboarding easier. Red Hat environments naturally gravitate toward CRI-O since Red Hat maintains it. If performance is sensitive, crun should be evaluated.
Multi-tenancy requirements strongly influence the choice. If you are running untrusted code from multiple sources, gVisor's sandbox isolation is valuable despite the performance cost. If you need strict isolation between tenants, Kata Containers' VM-based approach is worth the overhead. If your cluster is shared within a trusted organization, runc or crun provides adequate isolation.
Performance requirements also drive decisions. For serverless or edge environments where startup latency matters, crun is preferred. For standard enterprise infrastructure, runc provides excellent performance and stability. For VM-like isolation requirements, Kata is necessary despite the overhead.
Compliance and security requirements shape the choice significantly. If memory-safe execution is a compliance requirement, youki is the choice despite being less mature than runc. If sandbox isolation is mandated for running untrusted code, gVisor's syscall interception is essential. For standard production deployments, runc combined with strong seccomp policies and AppArmor/SELinux provides good security with excellent performance.
The default recommendation for most standard Kubernetes deployments is containerd with runc. This combination is battle-tested, widely supported across the industry, and provides excellent performance and stability. It is the right choice unless your specific requirements (performance, security, isolation, or compliance) push you toward an alternative.
Next Steps: Understand what happens inside the kernel. See "How Containers Interact with the Linux Kernel" and "Container Scope vs Kernel Scope."
