Key Information
- SonarCloud discovered a critical Zip Slip vulnerability in OpenRefine.
- If a user running a vulnerable version is tricked into importing a malicious project, an attacker could execute arbitrary code on the user’s machine.
- SonarCloud not only discovered the vulnerability but also provides valuable guidance on how to mitigate this kind of vulnerability and prevent common pitfalls.
- The vulnerability was fixed with version 3.7.4.
OpenRefine Zip Slip Vulnerability: Introduction
OpenRefine is a Java-based open-source data cleaning and transformation tool. This includes loading different types of data, cleaning it, converting it, and extending it. All of this can be done from the browser by accessing OpenRefine’s web interface. With almost 10k stars and ~1.8k forks, it is one of the more popular GitHub projects.
In our continuous effort to help secure open-source projects and improve our Clean Code solution, we regularly scan open-source projects via SonarCloud and evaluate the findings. In fact, everybody can also do it – SonarCloud is a free code analysis product for open-source projects, regardless of their size or language.
One of the findings reported by SonarCloud was a Zip Slip vulnerability in OpenRefine that made us curious. A Zip Slip vulnerability is caused by inadequate path validation when extracting archives, which may allow attackers to overwrite existing files or extract files to unintended locations.
In this article, we outline the impact of this vulnerability and explain how this and other code vulnerabilities can be detected with SonarCloud. Furthermore, we explain how attackers could exploit the vulnerability and describe a typical pitfall developers may fall into when trying to fix it.
OpenRefine Zip Slip Vulnerability: Impact
OpenRefine version 3.7.3 and below is prone to a Zip Slip vulnerability in the project import feature (CVE-2023-37476). Although OpenRefine is designed to only run locally on a user's machine, an attacker can trick a user into importing a malicious project file. Once this file is imported, the attacker can execute arbitrary code on the user’s machine:
The vulnerability was fixed with OpenRefine version 3.7.4.
OpenRefine Zip Slip Vulnerability: Technical Details
In this section, we dive into the technical details of the vulnerability.
Vulnerability Discovery
SonarCloud is our cloud-based code analysis service. It uses state-of-the-art techniques in static code analysis to find quality issues, bugs, and security vulnerabilities in your code. With the recently added deeper SAST technology it is even possible to uncover hidden security vulnerabilities introduced by the usage of third-party dependencies.
During our regular scan of public open-source projects, the engine reported the following issue in OpenRefine (see it yourself on SonarCloud):
As clearly visible by the highlighted code flow, the untar
method iterates over all files within an archive and uses the tarEntry.getName()
method to create a new File
object, which is then passed to FileOutputStream
to extract this file. This introduces a Zip Slip vulnerability allowing an attacker to write files outside the intended folder (destDir
) by creating an archive with a file, e.g., named ../../../../tmp/pwned
.
The vulnerable untar
method is called from the FileProjectManager.importProject
method, which handles the import of existing Refine project files:
OpenRefine/main/src/com/google/refine/io/FileProjectManager.java
Projects can either be imported by directly uploading an archive or by providing the URL of an archive. This is what the feature looks like on the web interface:
The corresponding endpoint is called /command/core/import-project
. Although this and all other endpoints of OpenRefine do not require authentication, OpenRefine is supposed to run locally on a user’s machine. Additionally, the employed CSRF protection prevents malicious JavaScript code executed in the context of another website from performing unauthorized actions. In order to exploit the vulnerability, an attacker could still trick a user into importing a malicious project.
Exploitation via Auto-Reload
The vulnerability gives attackers a strong primitive: writing files with arbitrary content to an arbitrary location on the filesystem. For applications running with root
privileges, there are dozens of possibilities to turn this into arbitrary code execution on the operating system: adding a new user to the passwd
file, adding an SSH key, creating a cron job, and more. For applications running with the permissions of a low-privilege user, the opportunities are more limited but still occur – earlier this year, we documented a unique way to achieve code execution by writing a site-specific configuration hook, which is limited to Python applications.
Besides these generic techniques, there might be features of the application itself, which could be leveraged by attackers. In the case of OpenRefine, the application implements an auto-reload feature, which regularly scans the WEB-INF
folder for changes and restarts the WebAppContext
when a file is changed:
OpenRefine/server/src/com/google/refine/Refine.java
All classes within the WEB-INF/classes
folder are reloaded during the restart of the WebAppContext
. This means that attackers could overwrite an existing .class
file within this folder, which triggers the reload and subsequently executes the attacker's .class
file, resulting in the ability to execute arbitrary code.
Mitigation, Pitfall, and Patch
In order to mitigate this vulnerability, it needs to be ensured that all files are extracted under the intended base folder. One way you might think of doing this is by using the getCanonicalPath
method to retrieve the absolute and unique path as a String and then leverage the startsWith
method to verify that the destination path is part of the intended base folder:
Caution: This does not fully fix the vulnerability! Can you spot the problem here?
The getCanonicalPath
method removes terminating path separators, which makes this still vulnerable to a partial path traversal!
Assuming the base folder (destDir
) is defined as the home directory of the user john ("/home/john/"
), the trailing slash is removed, resulting in "/home/john"
. This means that attackers could still partially path traversal to another user’s home directory beginning with the same characters, e.g., "/home/johnny/"
since this passes the check:
A real-life example of such a partial path traversal vulnerability can be found here, which is covered in more detail in the related Black Hat talk by Jonathan Leitschuh.
We continuously keep track of freshly unveiled pitfalls like this and add them to our engine. To correctly fix a vulnerability, you can click on the "How can I fix it?"
tab directly attached to the corresponding issue on SonarCloud:
In order to prevent this partial path traversal, there are two different approaches:
- Reinsert the path separator for the base folder after calling
getCanonicalPath
- Retrieve the
Path
object related to theFile
and use itsstartsWith
method. This does not literally compare the path’s string but determines this on a path’s elements basis.
For OpenRefine, the maintainers avoided falling into this trap. They correctly fixed the vulnerability by leveraging the toPath
method:
This effectively prevents files from being written outside the intended destDir
folder.
Timeline
Date | Action |
2023-07-07 | We report the issue to the maintainers |
2023-07-08 | Maintainers confirm the issue and start working on a patch |
2023-07-17 | OpenRefine Version 3.7.4 is released, which fixes the issue |
2023-07-17 | CVE-2023-37476 is assigned |
OpenRefine Zip Slip Vulnerability: Summary
In this article, we deep-dived into a critical Zip Slip vulnerability in OpenRefine. We also outlined how attackers can leverage an application’s features to turn a file write into arbitrary code execution. Furthermore, we highlighted common pitfalls developers may face when trying to fix this path traversal vulnerability.
With the help of SonarCloud, this vulnerability was not only detected in a matter of seconds, it could also be fixed properly by relying on the comprehensive information SonarCloud provides for each raised issue. This applies to security issues, but also code quality problems, which helps developers to write Clean Code, increasing security, maintainability, and reliability.
Finally, we would like to thank the OpenRefine maintainers for quickly responding to our notification, providing a comprehensive patch, and transparently informing all users.