The Illusion of the Single Artifact

Watch the Lesson

You pull a Docker image: docker pull nginx:latest. The output shows 6 layers, 50MB total. You run a container security scanner—Trivy, Grype, Syft—and it reports 347 vulnerabilities across 82 unique packages. Where did 82 packages come from? You only asked for nginx.

This is the artifact illusion. What looks like a single, simple container is actually an archaeological excavation of dependencies stacked in layers. Each layer contains libraries, tools, and system packages built on top of others. Most are invisible until a scanner drills through the filesystem and materializes the full dependency graph.

This hidden complexity is not a mistake. It's structural. Understanding what's really inside your artifacts—and why you can't see it without tools—is foundational to supply chain security.

graph TB    Pull["docker pull nginx"]    Pull --> See["What You See<br/>6 layers<br/>50MB<br/>Simple!"]    Pull --> Actually["What Actually<br/>Exists<br/>347 CVEs<br/>82 Packages<br/>Complex!"]    Actually --> Scanner["Run Scanner<br/>Trivy/Grype"]    Scanner --> Reveals["Reveals<br/>Hidden Dependencies<br/>Transitive Packages<br/>Vulnerabilities"]    See --> Illusion["❌ Visible Artifact<br/>Simplified View"]    Reveals --> Reality["✅ Real Artifact<br/>Complete Reality"]    style Illusion fill:#ffcccc    style Reality fill:#fff9c4

The Container Image You Think You Have vs. The One That Actually Exists

When you execute docker pull nginx, you're not getting a single monolithic blob. The Docker daemon pulls multiple layers—each a compressed filesystem delta. These layers get stacked during the container build process.

Here's what the manifest says:

FROM debian:12-slimRUN apt-get update && apt-get install -y nginx curlEXPOSE 80CMD ["nginx", "-g", "daemon off;"]

The build process creates separate layers. Layer 1 is the debian:12-slim base image (15MB) containing directories like /bin, /lib, /etc, /var, /usr/bin, and /usr/lib. Layer 2 is the nginx installation (22MB) containing /etc/nginx, /usr/sbin/nginx, and /var/www. Layer 3 is the curl installation (8MB) containing /usr/bin/curl and /usr/lib/libcurl. Layer 4 is configuration (5MB) containing Dockerfile metadata and entrypoint settings.

Each layer is a self-contained filesystem snapshot. When the container runs, these layers are mounted as union filesystems. But here's the critical part: each layer brought its own dependency tree with it.

The Debian base image (15MB) didn't arrive as a small set of essential system files. It's pre-built, pre-optimized for general-purpose use. It includes GNU coreutils tools like ls, cat, grep, cp, and mv; the libc and dynamic linker; shell interpreters including bash and sh; the package manager apt and dpkg; timezone data; locale files; shadow utilities for managing passwd, useradd, and groupadd; cryptography libraries like openssl and libgcrypt; and sometimes a C compiler toolchain. None of these were explicitly requested. They're implicit dependencies of the base image.

When nginx installs via apt-get install nginx, it doesn't magically pull only nginx. The package manager resolves a dependency tree. Nginx depends on numerous libraries: libc6, libpcre3, zlib1g, libssl3, libgd3, libexpat1, libfontconfig1, libfreetype6, libjpeg62-turbo, libpng16-16, and more than 10 additional libraries. Each of those dependencies has its own requirements.

Each of those libraries has its own dependency tree. libssl3 depends on libcrypto, which depends on platform-specific build artifacts and historical compatibility layers.

The full transitive dependency graph for a single "nginx" package typically includes 80–150 distinct packages from the base image and package manager alone. You requested one. You got 80+.

Then curl adds another 40–60 packages. Some overlap with nginx's dependencies (both need libc, libssl). Some are unique to curl's requirements. The union of both creates a combined set of roughly 120–160 system packages in your final image.

A security scanner like Trivy reads the package metadata from /var/lib/apt/lists/ and /var/lib/dpkg/status and reconstructs this graph. The 347 vulnerabilities you saw? They're distributed across those 82+ packages.

Why Layers Hide Complexity

The OCI Image Format (the standard behind Docker images) organizes images into layers primarily for efficiency: deduplication, caching, and transfer optimization. Layers are immutable snapshots, stacked in order. But this architecture creates an appearance of simplicity that doesn't reflect reality.

When you look at a Dockerfile:

FROM debian:12-slimRUN apt-get update && apt-get install -y nginx

You see two intentional decisions: using Debian 12 slim and installing nginx. But you don't see the decision tree that was made for you by the maintainer of debian:12-slim. They chose which packages to include, which to exclude, which libc version to use, which OpenSSL version to support. Those decisions are now permanent parts of your artifact.

Similarly, when you RUN apt-get install nginx, you don't see the decisions made by Debian's nginx package maintainer about which versions of dependent libraries to pull, or which optional features to enable or disable at compile time. These hidden choices compound across layers. Each layer is frozen at the moment it was built, with the dependency versions and architectures that were available then. Your current image might be running OpenSSL 3.0.8 from November 2023 because that's what was current when the base image was published. You have no control over it without rebuilding.

The Dependency Iceberg

Visualize dependency complexity as an iceberg structure. Above the waterline is what you requested: nginx and curl (explicit dependencies). Below the waterline is what was pulled implicitly: OpenSSL, Zlib, Libexpat, Libc6, Libjpeg, Libgd, Pcre3, Fontconfig, Libfreetype, and more than 50 additional libraries, each with their own dependency subtrees (transitive dependencies). Below that is the base image implicit dependencies: Bash, Debian utilities, Coreutils, Shadow, Timezone data, Locale files, Package manager metadata, and 80-120 additional packages from the base.

The waterline represents everything your Dockerfile explicitly mentions. Below it are the transitive dependencies. Below that is the base image itself—a complete Linux distribution, compressed and cached.

This iceberg structure isn't specific to Docker. It's universal across all artifact formats: Python wheels, Java JARs, Node.js containers, systemd units. Every artifact is a tree, and scanning tools have to traverse the full tree to answer security questions.

The Mathematics of Hidden Complexity

Industry data shows consistent patterns across modern applications. Direct dependencies per application range from 30 to 150 items—what you explicitly list in requirements.txt, package.json, or pom.xml. The transitive multiplier averages 4.3x, meaning each direct dependency pulls in approximately 4.3 additional transitive dependencies. Total dependencies in a typical container range from 80 to 250 items. Typical vulnerabilities per 100 packages run from 2 to 8, varying by base image and package age.

This means a modest application with 40 direct dependencies often has 172 to 600 transitive dependencies (40 multiplied by 4.3 equals approximately 172 at minimum, and each transitive can add more). In a 100-package container, you should expect 2 to 8 known vulnerabilities on average. Larger containers routinely report 200 or more vulnerabilities.

These numbers aren't inflated. They're the result of how modern software is actually composed. No tire manufacturer makes their own rubber. No web framework manufacturer writes their own cryptographic primitives. Dependency chains are deep because modern software is built on abstraction layers. But that depth is invisible unless you use a scanner. Running docker inspect image_id shows you the layers and their sizes, not their contents. Running docker run image_id ls / shows you the filesystem, not the package inventory. Only scanners read the package databases and rebuild the dependency graph.

Why One Vulnerability Can Break Everything

The transitive dependency graph creates fragility. When a vulnerability is discovered in a low-level library, fixing it cascades upward. Consider this example: a critical vulnerability is discovered in zlib1g (the compression library), which is used by nginx for gzip compression, curl for response decompression, libc6 for system-level compression, libpng which nginx uses for image rendering, and libexpat for XML parsing.

Patching it requires updating the base image and rebuilding all dependent layers. But upstream dependencies might break. Nginx compiled against zlib1g version X might not work with zlib1g version Y if the ABI changed. Curl might have a similar issue. The patch that fixes the vulnerability might introduce a regression.

This is why security vulnerability databases don't just report the vulnerability itself. They report affected versions (for example, zlib 1.2.11 through 1.2.13), fixed versions (for example, zlib 1.2.14 and later), affected downstream packages, and severity assessment considering transitive impact. A single vulnerability in zlib can require rebuilding dozens of downstream packages, testing each one, and coordinating releases across multiple teams.

The Scanning Problem: You Need Tools to See

Without a scanner, you have no visibility into your artifact's dependency graph. This is why scanning has become a standard control in supply chain security. Tools like Trivy, Grype, and Syft read container image layers and extract package databases. They reconstruct the dependency tree by reading files like /var/lib/dpkg/status for Debian and Ubuntu packages, /var/lib/rpm/ for Red Hat and CentOS packages, /usr/local/go/ for Go binaries, /usr/local/python/ for Python packages, and lock files in standard locations such as package-lock.json, poetry.lock, and go.sum.

Running a scan on your nginx image without a scanner, you'd never know about the 82 packages. Running a scan with Trivy, you get a detailed report of every package, version, and known vulnerability. The irony is that the scanner's findings feel surprising—347 vulnerabilities in what should be a small nginx installation—but the vulnerabilities were always there. The image didn't become vulnerable when you scanned it. The scanning just made the invisible visible.

Understanding Transitive vs. Direct Dependencies

The distinction matters for patching and risk assessment. Direct dependencies are what you chose. You put nginx in your Dockerfile, or requests in your Python requirements.txt. You're responsible for keeping them updated. You understand their role in your application.

Transitive dependencies are chosen for you by your upstream. You depend on nginx, nginx depends on zlib, so you transitively depend on zlib. You might not know zlib exists. Zlib's maintainers don't know your application exists. But a vulnerability in zlib affects you anyway. This asymmetry is the core of the supply chain problem. Your application's security posture depends on the decisions of hundreds of upstream maintainers, none of whom you have a direct relationship with.

Mapping the Chain

For a real nginx container, the transitive chain is substantial. Your application has a direct dependency on nginx (1.20.2), which in turn has transitive dependencies on libpcre3 (8.39-13), zlib1g (1.2.11.dfsg-2+deb11u2), libssl3 (3.0.8-1), which depends on libcrypto3 (3.0.8-1), plus more than 10 additional transitive dependencies. Additionally, your application indirectly depends on the debian:12-slim base image, which brings transitive dependencies on bash (5.1-2+deb11u1), libc6 (2.31-13), coreutils (8.32-4+b1), and more than 100 additional transitive dependencies.

Each leaf in this tree is a potential vulnerability entry point. Each node represents maintenance and patch management by an external team.

The Hidden Costs of Container Complexity

This dependency complexity creates real operational costs. Scanning time is significant: extracting and analyzing 150 or more packages takes compute resources. Enterprise scanners scanning thousands of images daily face substantial infrastructure costs.

Decision paralysis occurs when a vulnerability is reported and you must determine which artifacts are affected by traversing the dependency graph. Is nginx-1.20.2 vulnerable to CVE-2024-XXXX? You need to check if it depends on the affected library, which version, and whether the code path is exploited. This is not a simple binary yes or no answer.

Patch coordination becomes complex because patching a transitive dependency requires rebuilding and redeploying upstream artifacts. In a microservices architecture with 50 services, a single critical zlib vulnerability might require rebuilding all 50. If even one fails during rebuild, you have a production incident.

Dependency conflicts emerge when two direct dependencies pull incompatible versions of the same transitive dependency. In compiled languages like Go, this means choosing one version and potentially leaving the other insecure. In interpreted languages like Python, it might mean creating separate virtual environments.

Supply chain risk concentration is a critical problem: you didn't choose to depend on zlib, but you do. If zlib's maintainer is compromised or abandons the project, your security posture is directly affected. This concentration of risk is why supply chain attacks focus on high-value transitive dependencies.

What This Means for Your Security Strategy

Understanding the artifact illusion changes how you approach security. Scanning is mandatory, not optional. Without it, you're flying blind. You can't know what's in your artifacts without asking a tool to read through the layers.

Vulnerability remediation is not simple. A single CVE report doesn't mean "patch X and move on." You need to understand the dependency tree, identify downstream impact, test the patch, and coordinate releases.

Base image selection has lasting consequences. The base image you choose today becomes a permanent dependency of your artifact. Switching from a full Debian distribution with over 350 packages to Alpine with under 10 packages is a complex decision with security tradeoffs.

Patching at scale requires automation. With 50 or more artifacts and dozens of patches per month, manual patching is impossible. Automated rebuild pipelines that trigger when dependencies are updated are essential.

Transitive dependencies deserve security attention. Your direct dependencies are visible and explicitly chosen. Transitive dependencies are harder to manage but equally important. Tools like Software Bill of Materials (SBOM) generation and dependency tracking systems such as OWASP CycloneDX or SPDX are becoming standard because organizations need explicit visibility into transitive dependencies.

The supply chain starts at the package level. Container security doesn't exist in isolation. Every vulnerability in a container traces back to a package, a build system, a release process, and a maintainer. Securing the supply chain means securing the entire chain from source to deployment.

The Path Forward

The illusion of the single artifact—that docker pull nginx gets you just nginx—persists because it's convenient. But modern software doesn't work that way. Every artifact is a graph. Every graph has depth. Every vulnerability has impact proportional to how deep that graph goes.

The first step in supply chain security is accepting that complexity. Accept that your 50MB container image contains 150+ packages. Accept that you depend on hundreds of upstream maintainers. Accept that you need scanning tools to make that dependency graph visible.

From there, you can build controls: scanning on every build, automated patching when vulnerabilities are discovered, SBOM generation for compliance and transparency, and dependency updates as part of your normal development workflow.

The artifact illusion can't be eliminated. But it can be managed. The tools and practices exist. The question is whether you use them.