A recent discovery identifies critical vulnerabilities affecting Docker and other container engines. Collectively called "Leaky Vessels", the vulnerabilities pose a significant threat to the isolation that containers inherently provide from their host operating systems. The new CVEs underscore a fundamental flaw in the architecture of container technology.
In this blog post, we discuss the Leaky Vessels vulnerabilities, providing an in-depth analysis of each vulnerability, their attack vectors, impacts, and suggested mitigation strategies.
While timely patching remains crucial, a proactive approach is paramount to thwarting these threats and safeguarding the containerized landscape.
CVE-2024-21626 resides in runc, a critical tool responsible for spawning containers. Due to an internal file descriptor leak in versions up to and including 1.1.11, attackers can manipulate the working directory (process.cwd) of a newly spawned container process. Like an unlocked door, the leak leaves the file descriptor open, providing access. This manipulation allows the process to access the host filesystem, granting unauthorized access and potential container breakout. The risk is significantly reduced, however, when using prebuilt images from reputable registries that maintain patched images.
The vulnerability arises from how runc handles file descriptors during container spawning. When the build process is done on the machine while setting the container's working directory using setcwd(2), the file descriptor linked to the container's working directory remains open — even if the user has it set to O_CLOEXEC.
The open file descriptor becomes accessible to the container process if its path resolves to a directory on the host filesystem.
Attackers leverage CVE-2024-21626 by manipulating the process.cwd value. In a malicious image attack scenario, the image script sets process.cwd to the leaked descriptor path. In the "runc run" attack scenario, the attacker sets environment variables or command-line arguments that influence process.cwd during container creation.
An attacker can embed a specially crafted container image containing a script that sets the process.cwd to a path on the host filesystem accessible through a leaked file descriptor. When the image is executed, the container process will gain access to the host, potentially leading to privilege escalation and compromise.
An attacker needs to embed malicious code within container images to exploit the file descriptor leak and potentially gain host access. This can occur through various means, including:
Attackers with some level of access to the host system (e.g., indirect access through a vulnerable service or application running on the host) can exploit the vulnerability while running a container using the runc run command. By manipulating arguments and environment variables, they can trick the container process into setting its working directory to a leaked file descriptor on the host, achieving container breakout.
Another scenario presents a broader impact (but more difficult to exploit) than the one mentioned above. An attacker has knowledge about an administrative process calling runc exec with the current working directory flag (i.e., cwd) and knows the leaked file descriptor ID. In this scenario, it’s possible that the attacker will change the path with a symbolic link to a leaked file descriptor. This exploitation will result in the attacker gaining access to the host file system, bypassing PR_SET_DUMPABLE protection.
Once the container process has its working directory set to the leaked descriptor path, it can access the host filesystem at that location. This can grant read, write and even execute privileges depending on the permissions of the leaked file descriptor.
Upgrade runc to version 1.1.12 or later, which addresses the vulnerability by properly closing leaked file descriptors. Additionally, since the build process already happened, using a prebuilt image from a reputable registry will mitigate the risk. It's still important to check the build date of the image compared to the runc patch date, though.
CVE-2024-23651 stems from a symlink race condition in Docker versions below 23.0.1. The issue occurs during the image build process within the cache mount mechanism, as implemented in Buildkit versions up to and including 0.12.4. Exploiting this symlink race condition allows attackers to access files from the host system, but the likelihood of a successful exploitation is low, as it’s difficult to beat the race condition.
When you use the RUN --mount=type=cache directive in a Dockerfile, you can specify a source for the cache mount. This source is a directory on the host system that the Docker daemon uses as the cache directory. Docker then mounts this directory at a specified location in the Docker image being built.
The vulnerability arises when the validation of this cache mount source path is exploited through a race condition. The attack may replace the source path with a symbolic link to an arbitrary directory, which could be mounted into the Docker image if the exploit is successful.
Attackers could try to exploit this issue by causing the user to build two malicious images at the same time, which can be done by poisoning the registry, typosquatting or other methods. The build will mount a random cache called X and create a directory called Y.
Meanwhile, the second build attempts to mount the cache in path X, located within Y. After confirming Y is a directory, the first build overrides it with a symlink to a sensitive location. Subsequently, the second build follows the symlink and mounts the sensitive directory to the container file system. It’s important to mention that beating the race condition is almost impossible, as the window of opportunity is fleeting (a few milliseconds at most), and the attacker has no control over the timing of the attack.
If successful, the attacker can access and potentially manipulate files on the host system, which could lead to privilege escalation, data exfiltration or other malicious activities.
Upgrade Docker to version 23.0.1 or later, which addresses the vulnerability by fixing the race condition in cache invalidation. Make sure to upgrade any instances of Buildkit to version 0.12.5 or later.
Building and using images from trusted sources as a best practice can limit the possibility of exploitation.
CVE-2024-23652 resides in Buildkit versions <=v0.12.4 and allows attackers to manipulate the container's temporary directories used during image building to delete arbitrary files on the host system. While primarily exploited during malicious Dockerfile builds, it can also potentially impact other Buildkit-based build systems.
Buildkit mounts the directory from the host into the container filesystem for various phases of image building. When these directories become empty, Buildkit attempts to remove them automatically during cleanup. The vulnerability lies in how Buildkit determines if a directory is empty. It only checks for files within the directory itself, not considering potential mount points within and deleting the mount points on the host.
Attackers leverage this logic by crafting a malicious Dockerfile. The file mounts a specific directory on the host system within the container's filesystem. Next, it creates a strategically placed empty directory within the container, positioned above the mount point in the container's hierarchy. The attacker then manipulates files within the mounted directory (which are actually on the host), tricking BuildKit into thinking the empty container directory is safe to delete. During cleanup, BuildKit mistakenly deletes the corresponding directory on the host as well, even if it contains files accessible through the mount point.
This Leaky Vessel vulnerability allows attackers to potentially delete critical system files, corrupt image builds or gain unauthorized access to sensitive data on the host. Remember, the attack hinges on manipulating the mounted host directory, not an empty container directory. Mitigate this risk by updating BuildKit to version v0.12.5 or later.
Attackers exploit this vulnerability by creating malicious Dockerfiles with specific steps:
Step 1: Mount a Host Directory
Attackers use the RUN --mount directive to mount a targeted directory on the host system within a specific location inside the container's filesystem.
Step 2: Create an Empty Target Directory
Within the container, attackers create an empty directory strategically placed above the mounted directory's location.
Step 3: Manipulate Files Through the Mount
Commands within the Dockerfile manipulate files on the host system accessible through the mounted directory. Doing so tricks BuildKit into believing the directory is empty.
Step 4: Trigger Unintended Deletion
During cleanup, BuildKit mistakenly deletes the corresponding host directory as well, even if it contains files accessible through the mount point.
Manipulating mounted directories grants attackers a deceptive weapon that transcends file deletion. Forget directly reading data — the ability to delete becomes a sinister tool. Imagine attackers orchestrating denial-of-service chaos by removing critical system files, compromising security via security-related setting manipulations, or deleting encryption keys to expose sensitive data.
The destructive reach of this vulnerability extends to corrupting entire image builds, potentially disrupting software delivery pipelines.
Upgrade Buildkit to version 0.12.5 or later, which patches the vulnerability by properly addressing the empty directory checking logic.
Additionally, if possible, avoid using RUN --mount with untrusted Dockerfiles or frontend configurations.
This vulnerability resides in BuildKit versions before v0.12.5 and arises from improper entitlement checks within its Interactive Containers API. This API allows running containers based on built images for interaction and customization. The vulnerability enables attackers to leverage a specially crafted Dockerfile to exploit these entitlement checks and potentially achieve container escape.
Attackers create a Dockerfile that utilizes container configuration commands like RUN and USER. This configuration triggers a specific code path within BuildKit's Interactive Containers API where missing or inadequate entitlement checks could allow the container to have elevated privileges.
If attackers exploit these elevated privileges within the container, they could use existing vulnerabilities or misconfigurations to break out of the container and access the host system.
Attackers can craft a malicious Dockerfile that utilizes container configuration commands like RUN, USER and others to trigger a vulnerable code path within BuildKit's Interactive Containers API. Exploiting this path allows the container to bypass intended entitlement checks, potentially allowing attackers to gain elevated privileges within the container.
Container escape: The primary concern is potential container escape, granting attackers unauthorized access to the host system's resources and data.
Escalated privileges: The exploited vulnerability could grant elevated privileges within the container, potentially facilitating further malicious activities.
Disrupted builds: Malicious use of this vulnerability could corrupt image builds or disrupt build processes.
Update your BuildKit installation to version v0.12.5 or later. The newer versions include the necessary patch to address the vulnerability.
As a best practice, review existing Dockerfiles, Be cautious with Dockerfiles, especially those obtained from untrusted sources. Scrutinize them for suspicious commands like RUN, USER or configuration settings that might grant unintended privileges.
|Container breakout, potential privilege escalation
|Update to runc 1.1.12+
|CVE-2024-23651 (Docker Engine, Buildkit)
|Docker Engine <23.0.2,
Buildkit <= 0.12.4
|Container breakout, access to host files
|Update to Docker Engine 23.0.2+ or Buildkit 0.12.5+
|Data corruption, privilege escalation
|Update to Buildkit 0.12.5+
|Container breakout, privilege escalation
|Update to Buildkit 0.12.5+
The Leaky Vessels vulnerabilities have exposed critical flaws in the foundation of container security. While patching is essential, it's only the first step. Building secure container environments requires a layered approach that combines timely updates with strong security practices.
Containers aren’t islands. They’re interconnected parts of a larger ecosystem, and the security of one depends on the security of all.
By taking proactive steps and adopting a layered security strategy, you can ensure your containerized applications remain resilient and protected from future threats. Remember, container security is an ongoing journey, not a one-time fix. Stay vigilant and adapt your defenses as the threat landscape evolves.
Prisma Cloud users can identify the affected workloads by searching for relevant CVE in the Vulnerability Explorer.
Figure 1: CVE-2024-21626 in Prisma Cloud’s Vulnerability Explorer
You can also check all the affected artifacts by searching for relevant CVE in the CVE Viewer.
Figure 2: Results for CVE-2024-23651 in Prisma Cloud’s CVE Viewer
Figure 3: Results for CVE-2024-23653 in Prisma Cloud’s CVE Viewer
Figure 4: Results for CVE-2024-23652 in Prisma Cloud’s CVE Viewer
Figure 5: Results for CVE-2024-21626 in Prisma Cloud’s CVE Viewer
The implications of these vulnerabilities, as discussed, are far-reaching, challenging the security assumptions behind containerized environments. They highlight the need for rigorous security practices, including the scrutiny of container images and the adoption of tools and strategies to detect and mitigate such vulnerabilities. As the container technology landscape continues to evolve, so too must the approaches to securing it, ensuring that the benefits of containerization don’t come at the expense of security.
Prisma Cloud secures applications from Code to Cloud enabling security and DevOps teams to effectively collaborate to accelerate secure cloud-native application development and deployment. If you haven’t experienced the advantage, take Prisma Cloud for a test drive with a free 30-day Prisma Cloud trial.