A container image running Alpine pulls a package with apk add nginx. What actually happens? A file is downloaded from a repository, verified cryptographically, unpacked, scripts are executed, and files are installed to specific paths. The package manager handles hundreds of steps silently.
Understanding Linux packages—what they are, how they work, why there are different formats, and how they fit into container security—is essential for building reliable container images.
graph LR Cmd["apk add nginx"] --> Repo["Repository<br/>apk.alpinelinux.org"] Repo --> Fetch["Download<br/>Package<br/>nginx.apk"] Fetch --> Verify["Verify<br/>Checksum<br/>Signature"] Verify --> Extract["Extract<br/>to filesystem"] Extract --> Files["Install<br/>Binaries<br/>Libraries<br/>Config<br/>Docs"] Files --> Result["nginx ready<br/>to use"] style Cmd fill:#e3f2fd style Result fill:#c8e6c9What Is a Linux Package?
A Linux package is a single distributable artifact containing several components: compiled binaries such as executable files located in /usr/bin/nginx, libraries including shared code in /usr/lib/libssl.so.3, configuration files with default settings like /etc/nginx/nginx.conf, documentation including man pages and READMEs, license information documenting legal notices and restrictions, metadata such as package name, version, dependencies, maintainer information, and checksums, and scripts that execute during installation, upgrade, and removal.
A package is essentially a tar archive with metadata attached, bundling all these components together for distribution and installation.
Example: The curl package includes the curl executable (goes to /usr/bin/curl), the shared library (goes to /usr/lib/libcurl.so.4), SSL certificates (go to /etc/ssl/certs/), manual pages (go to /usr/share/man/man1/curl.1.gz), and license information (goes to /usr/share/licenses/curl/).
When a package is installed, files are placed in standardized locations as described above. The package manager extracts these files to the specified paths, runs any initialization scripts, and registers the package as "installed" in a local database.
Package Formats: .deb, .rpm, .apk
Different Linux distributions use different package formats, each with strengths and weaknesses.
.deb Format (Debian, Ubuntu, Linux Mint)
.deb files are used by Debian-based distributions. The format includes two tar archives: one for the package data and one for metadata.
A .deb package file contains three main components: debian-binary which identifies the format version, control.tar.gz containing metadata such as package name, version, dependencies, and installation scripts, and data.tar.gz containing the actual files to be installed on the system.
Installation works through two approaches: dpkg -i package.deb provides low-level unpacking and registration, while apt-get install package provides high-level resolution of dependencies, downloads, and installation.
Pros: These distributions have a mature ecosystem with extensive tools, large package repositories (90,000+ packages in Ubuntu), excellent dependency resolution, and wide support across hosting providers.
Cons: The format includes metadata redundantly making the package larger, and it's slower for minimal containers with base images exceeding 100 MB.
.rpm Format (RHEL, Fedora, CentOS, AlmaLinux, Rocky Linux)
RPM (Red Hat Package Manager) is used by Red Hat-based distributions. RPM files are binary-encoded with checksums and signatures built in.
Installation works as follows: rpm -i package.rpm provides low-level installation, while dnf install package or yum install package provides high-level installation with newer dnf being faster than older yum.
Pros: The format is compact being binary-encoded, has built-in signature verification, and provides strong ecosystem for enterprise deployments.
Cons: It is less widely used in containerized environments and has slightly more complex dependency resolution than apt.
.apk Format (Alpine, OpenWrt)
APK (Alpine Package Keeper) is used by Alpine Linux and OpenWrt. It's the newest and smallest format, designed for minimal systems.
Installation uses apk add package as apk is the only package manager with no equivalent to apt or dnf.
Pros: Alpine base images are extremely compact at ~7 MB (vs 77 MB for Ubuntu), very fast making it good for CI/CD builds, clear separation of package manager and runtime, and cryptographic verification is built-in.
Cons: The package ecosystem is smaller with 8,000 packages compared to Debian/RHEL, uses musl libc instead of glibc (different from other distros), and has fewer maintainers meaning slower security updates.
Comparison table:
Feature | .deb (apt) | .rpm (dnf) | .apk (apk) |
|---|---|---|---|
Base image size | ~77 MB | ~150 MB | ~7 MB |
Packages | 90,000+ | 50,000+ | 8,000+ |
Speed | Fast | Medium | Very fast |
Ecosystem | Large | Medium | Small |
Security track | Excellent | Excellent | Good |
Libc | glibc | glibc | musl |
Containers | Common | Less common | Popular |
Package Managers: What They Actually Do
A package manager is a tool that handles the full lifecycle of packages: downloading, verifying, installing, upgrading, and removing.
The Package Manager Workflow
When a user runs apt-get install curl, the package manager executes a complex workflow. It parses the request to install curl and all dependencies, queries repository metadata to locate curl and determine available versions, checks what's already installed to avoid duplicates, recursively resolves all dependencies and their sub-dependencies, and creates a dependency installation graph respecting version constraints. It then downloads all required packages from the repository and verifies authenticity via GPG signatures. Installation proceeds in dependency order, unpacking files to standard locations (/usr, /etc, /var), running any pre-install scripts, registering the package in the local database, and running post-install scripts. Finally, the system updates various indexes and caches such as c_rehash for SSL certificate hashing and ld.so.cache for library lookups, then reports completion.
Dependency Resolution
When you install a package, the package manager must install its dependencies, and the dependencies' dependencies, recursively. For example, when installing nginx on Alpine with apk add nginx, nginx depends on several libraries: pcre3 for regular expression support, zlib for compression functionality, openssl for TLS/SSL encryption, and libc (the C standard library). Each of those libraries might have dependencies themselves. Both openssl and zlib depend on the C standard library (libc), so the package manager ensures libc is installed first, then installs the libraries that depend on it.
The package manager creates a dependency tree and installs in the correct order, with dependencies installed before packages that depend on them. If two packages depend on different versions of the same library, the manager tries to find a version that satisfies both requirements or fails if the versions are incompatible.
Repository Metadata
Package repositories are collections of packages plus metadata including package name, version, and dependencies; file checksums (SHA256, MD5); GPG signatures; and an index of what's available. The metadata is GPG-signed by the distro maintainer. When you install, the package manager downloads the metadata (unsigned or GPG-signed), verifies the signature by checking the distro's public key, downloads the package from the signed metadata, verifies the package's checksum against the metadata, and optionally verifies the package's GPG signature.
This chain ensures that packages come from a trusted source and haven't been tampered with during distribution.
When adding a repository like deb https://archive.ubuntu.com/ubuntu jammy main universe and running apt-get update, the system downloads the Release file (which is GPG-signed), verifies the signature using Ubuntu's public key, downloads the Packages file referenced in the Release, and parses it to find available packages with their cryptographic hashes. When you later install a package like nginx, the system verifies that the downloaded .deb file's hash matches what's listed in the Packages file, ensuring integrity.
Mirror Networks
Packages are mirrored across geographic regions to ensure fast downloads and redundancy. A single Ubuntu package might be available at multiple locations such as https://archive.ubuntu.com/ubuntu/pool/main/n/nginx/nginx_1.18.0-0ubuntu1.deb, https://mirror.example.com/ubuntu/pool/main/n/nginx/nginx_1.18.0-0ubuntu1.deb, and https://cn.archive.ubuntu.com/ubuntu/pool/main/n/nginx/nginx_1.18.0-0ubuntu1.deb. The package manager chooses a mirror that is often geographically close or fastest for the user's location. The exact file is identical across mirrors, verified by checksum.
Package Verification: Ensuring Integrity and Authenticity
When you download a package from the internet, three questions must be answered: Is it the right package (correct name, version)? Has it been modified since the maintainer released it (integrity)? Did it really come from the official maintainer (authenticity)?
Checksums
A checksum is a hash such as SHA256 or MD5 of the package file, where even a single byte change results in a different hash. Package metadata includes checksums such as Package: nginx, Version: 1.18.0-0ubuntu1.3, and SHA256: a1b2c3d4e5f6 (checksum of the .deb file).
When you download the package, the package manager verifies that the downloaded file's SHA256 hash matches the expected value from the metadata. Checksums effectively detect corruption from network errors or disk corruption, but they do not protect against intentional tampering since an attacker could modify both the file and the checksum.
GPG Signatures
GPG (GNU Privacy Guard) uses public/private key cryptography to prove authenticity. The distro maintainer signs packages with their private key to create a cryptographic proof that only they could have created. You verify the signature using the maintainer's public key, which confirms that the maintainer (and only the maintainer) signed the package.
Signatures provide three important guarantees: authenticity proves only the maintainer can create the signature since they have the private key, non-repudiation means the maintainer cannot deny signing it since the signature is cryptographic proof, and integrity ensures that if the package is modified after signing, verification fails.
Verifying an Alpine package works as follows:
apk add nginx# Behind the scenes:# 1. Download nginx package from repository# 2. Download package.sig (GPG signature)# 3. Use Alpine's public key (/etc/apk/keys/alpine-devel@lists.alpinelinux.org-...)# 4. gpg --verify package.sig# 5. If valid: install. If invalid: reject.Certificate Pinning and Repository Keys
Distros distribute their public key to users so they can verify signatures. Alpine stores keys in /etc/apk/keys/, Debian stores keys in /etc/apt/trusted.gpg.d/, and RHEL stores keys at /etc/pki/rpm-gpg/RPM-GPG-KEY-*. Container images include these keys in the base image, and when you run apk add, it uses the pinned keys to verify packages.
Packages in Containers: Build-Time Installation
Container workflows differ fundamentally from traditional systems. You don't install packages at runtime in production; you install at build time and bake them into the image.
Installing at Build Time
FROM alpine:3.18RUN apk add --no-cache nginx=1.24.0-r1 curl=8.1.0-r0COPY app.conf /etc/nginx/ENTRYPOINT ["nginx", "-g", "daemon off;"]When built, the image includes Alpine base OS files, nginx binary and libraries, curl binary and libraries, and your configuration.
At runtime (when the container starts), the package manager is not running. It's just executing the pre-installed binary.
Why Remove the Package Manager?
Production images should not include the package manager. Bad approach: FROM ubuntu:22.04 RUN apt-get update && apt-get install -y nginx includes apt and dpkg plus ~200 MB of unnecessary files. Good approach: FROM alpine:3.18 RUN apk add --no-cache nginx && rm -rf /var/cache/apk/* includes nginx only and is ~50 MB.
Benefits of not including the package manager are: smaller images (5–10x smaller making them faster to pull and store), fewer vulnerabilities (fewer packages means fewer CVEs and the package manager is often a target), immutability (container can't install packages at runtime by design), and security (attacker in the container can't use package manager to install backdoors).
Dependency Pinning in Containers
Always pin package versions explicitly rather than using floating versions. Pinning ensures reproducibility so that the same Dockerfile produces the same image every time. It provides control by letting you decide when to update dependencies rather than having them automatically updated. It enables auditability since exact versions are recorded in source control.
Finding package versions can be done as follows: for APK use apk search nginx to list available versions, apk info nginx to show package info. For apt-get use apt-cache search nginx and apt-cache policy nginx.
Multi-Stage Builds and Build Dependencies
Build dependencies (compiler, linker, build tools) are needed at build time but not at runtime.
# Stage 1: BuildFROM golang:1.21 as builderCOPY main.go .RUN go build -o app main.go # Stage 2: RuntimeFROM alpine:3.18RUN apk add --no-cache ca-certificates # Needed for TLSCOPY --from=builder /go/app /app/appENTRYPOINT ["/app/app"]Stage 1 includes the Go compiler (500+ MB); Stage 2 doesn't. The final image is ~30 MB instead of 500+ MB.
This pattern is crucial for compiled languages.
Package Organization: Virtual Packages, Metapackages, and Provides
Packages sometimes have complex relationships.
Virtual Packages
A virtual package is a name with no associated binary—it represents a feature or role. Several packages provide web-server including apache2, nginx, and httpd, so apt-get install web-server will ask which one to install. This allows flexibility: your code depends on "a web server" not "specifically nginx."
Metapackages
A metapackage is a package that exists solely to depend on other packages. It has no files, just dependencies. The build-essential package on Debian is a metapackage that depends on gcc, g++, make, libc6-dev, and others. Installing build-essential installs the entire compiler toolchain.
Provides and Conflicts
Packages can declare what they provide and what they conflict with. MariaDB declares it provides mysql-server (MariaDB provides MySQL compatibility) and conflicts with mysql-server (can't install with actual MySQL). This allows MariaDB to be a drop-in replacement for MySQL in dependency graphs.
Package Contents: Inspecting What's Inside
How do you know what a package contains before installing it?
Listing package contents uses APK apk info -L nginx, APT (Debian) dpkg -L nginx after installation or apt-file list nginx without installation, and RPM rpm -ql nginx after installation or repoquery -l nginx without installation.
Example output from apk info -L nginx shows the package contains files like /etc/nginx/nginx.conf, /usr/bin/nginx, /usr/sbin/nginx, /usr/lib/libnginx.so, /usr/share/man/man8/nginx.8.gz, and /var/lib/nginx/.
You can also inspect library dependencies using commands like ldd /usr/bin/curl to show every library curl depends on at runtime, including linux-vdso.so.1, libcurl.so.4, libssl.so.3, libcrypto.so.3, libc.so.6, and ld-linux-x86-64.so.2.
Vulnerability Scanning: Finding CVEs in Packages
Container vulnerability scanners work by running apk info, dpkg -l, or rpm -qa to list installed packages, comparing against vulnerability databases (NVD, GHSA, security advisories), and reporting CVEs found.
Example output from trivy image myimage:latest on alpine 3.18.0 shows total vulnerabilities and organizes them by severity and library. For instance, it might report that openssl (version 3.1.0-r0) has a HIGH severity CVE or that libz (version 1.2.13-r0) has a MEDIUM severity CVE.
The scanner knows that openssl 3.1.0-r0 contains a known vulnerability from these sources: the National Vulnerability Database (NVD) is the official CVE source, GHSA provides GitHub Security Advisory database, and Distro advisories include security updates from Alpine, Debian, and others.
Scanners aggregate these sources to maintain a current vulnerability database.
Common Patching Patterns
Pattern 1: Use Updated Package Versions
FROM alpine:3.18.4 # Newest patch versionRUN apk add --no-cache nginx=1.24.0-r1 # Latest versionWhen a new vulnerability is found in nginx, Alpine releases 1.24.0-r2 (patched version). You rebuild the image with the new version.
Pattern 2: Use Security Updates Regularly
FROM ubuntu:22.04.3RUN apt-get update && \ apt-get install -y --only-upgrade openssl libssl3 && \ apt-get clean && rm -rf /var/lib/apt/lists/*Ubuntu's repository includes security-only updates. Running apt-get upgrade applies all security patches.
Pattern 3: Use Distro Release Cycles
Alpine releases monthly; Debian LTS releases security patches as needed. Choose your base image based on how frequently you want to rebuild: Alpine offers fast patching with weekly updates being normal and requires more rebuilds, while Debian/Ubuntu provides slower patching with monthly updates being common and requires fewer rebuilds.
Summary: Linux Packages in Containers
Linux packages are the mechanism for distributing software in containers. Understanding them—what they contain, how they're verified, how they're installed, and what vulnerabilities they might contain—is essential for building secure container images.
Key principles for secure package management include: packages are archives bundling binaries, libraries, configs, and scripts together. Different formats exist for different distros: .deb for Debian, .rpm for RHEL, and .apk for Alpine. Package managers automatically handle dependency resolution by installing dependencies and their sub-dependencies. Packages must be verified cryptographically, where checksums ensure integrity and GPG signatures ensure authenticity. Installation should happen at build time, not runtime, so container images have packages baked in. Versions should be pinned explicitly for reproducible, auditable builds. Package managers should be removed from final images to reduce size, CVE exposure, and attack surface. Vulnerabilities exist in packages and scanners detect them by comparing package inventory against CVE databases. Patching occurs by updating package versions either manually or via automated rebuild pipelines.
Container security depends critically on managing packages well by choosing secure base images, pinning versions, scanning for vulnerabilities, and deploying patches before they become critical.
