Have you ever wondered what the open source maintainers that your business relies on do to keep our software healthy and secure? Here’s the first in a series of posts about urllib3, a Python package that is downloaded billions of times a year, and what it takes to keep it well maintained and up to date.
If you’d like to read the entire case study, you can download the full whitepaper right now at the link below.
What is urllib3?
urllib3 is a HTTP client for Python. It sits at the underpinning of many components of the Python ecosystem:
- requests, a high-level HTTP library, is built on top of urllib3
- boto, the Amazon AWS SDK library, uses urllib3 to communicate with AWS
- pip, the Python package manager, uses urllib3
And that’s just a fraction of its uses—urllib3 is used, directly or indirectly, by nearly 1 million direct dependent repositories (as of September 2022). It’s regularly among the top 3 downloaded Python projects. If you’re building with Python, you’re using urllib3.
Why is the security of urllib3 important?
First, there’s the functionality of urllib3 itself—it handles web requests, TLS/SSL, certificate validation, and more. If any of that functionality has issues, it can leave users open to compromise from hostile websites, man-in-the-middle attacks, and more.
Second, there’s the sheer scope of urllib3’s usage. It’s downloaded over 250 million times per month, as part of usage, packaging, installation, and continuous integration workflows. Code that is installed and used that often needs to be maintained and trusted— the cost of repository hijacking, trojaned code, and other systemic compromises could be staggering.
Who maintains urllib3?
Seth Michael Larson
Seth is a Python Software Foundation fellow and a tech lead at a large SaaS tech company, hired in part to maintain urllib3. Seth has been working on and maintaining urllib3 since 2016, and is currently the lead maintainer. He lives and works in Minneapolis.
Andrey Petrov
Andrey is an open source coder and former Google engineer. Andrey is the original author of urllib3, and has been maintaining it in whole or in part since 2008. Andrey currently acts as the meta-maintainer who ensures there is always a second person that can approve and release items. Andrey lives and works in Toronto.
Quentin Pradet
Quentin is a senior performance engineer at a large SaaS tech company. Quentin has been a maintainer of urllib3 since 2019, and does much of the day to day development. Quentin lives and works on Réunion Island.
The urllib3 team maintains project documentation, a testing framework, and secure release practices in order to maintain their community of contributors that work on bugs, features, and other fixes. They are leveraging the reliable income from Tidelift and other sources to pay contributors for some of this work. Knowing that almost 60% of maintainers have quit, or considered quitting, Tidelift’s strategic investment in urllib3 is critical to managing all of the workload that goes into keeping urllib3 secure and stable.
Why Tidelift’s partnership with maintainers is critical for reducing fire drills
Handling of CVEs via a coordinated disclosure process
When a package is as widely used as urllib3 is, handling of CVEs is paramount. When vulnerabilities are discovered, it’s important that maintainers be properly informed. This information starts a process where fixes must be developed quickly for supported branches, and the appropriate industry sources such as NIST and MITRE have accurate information. Finally, patched releases make their way to developers and users. If these processes aren’t in place, it becomes a fire drill for users as they have to react to random pings from their SCA tools about vulnerabilities, with limited information about how to fix them.
Take Log4Shell as an example. In late 2021, a critical vulnerability was discovered in the Java Log4j logging framework that required immediate remediation, and when the zero-day vulnerability was announced, only a release candidate fix was available. The fallout was immense, according to an (ISC)² survey:
Due to the ubiquitous nature of the vulnerability, 52% of respondents said their team collectively spent weeks or more than a month remediating Log4j and nearly half (48%) of cybersecurity teams gave up holiday time and weekends to assist with remediation.
urllib3 is a critical part of the Python ecosystem, in the same way that Log4j is for Java. To help in situations like this, urllib3 partners with Tidelift to handle their coordinated security disclosure process. You can read more about how in 2019 urllib3’s partnership with Tidelift helped them respond to a security report from a Python maintainer, inform MITRE, and provide fixed releases all in one day in this blog post. Users of urllib3 had a fix for the issue on the day the vulnerability was made public, ensuring they weren’t blindsided with a vulnerability they had no remediation for.
"Tidelift has made offering a comprehensive vulnerability disclosure process simple for the urllib3 team,” said Seth at the time. “This makes delivering secure code and responding quickly to vulnerabilities easy even for a small team."
Diagnosing the impact of a supply chain breach
In 2021, attackers were able to infiltrate Codecov, a service used for testing code coverage by over 20,000 projects and enterprises. By discovering a flaw in how Codecov built their images, they were able to modify a Codecov artifact that allowed the attackers to gain access to all environment variables that Codecov users used when accessing the service for a period of two months. The leaked environment variables could include passwords, tokens, and other secrets. Multiple organizations had their private source code and service access tokens leaked to the attackers.
For projects, it is critical that maintainers have the time to be able to track, investigate, and respond to these issues. When learning of this attack, the urllib3 team was able to investigate what variables were leaked, audit to confirm there wasn’t any unauthorized use of the leaked API token, and then change the token going forwards. If this wasn’t caught quickly, it’s possible that API token could have been used to compromise accounts or release trojaned software. Many users of urllib3 may not even have been aware this break happened, but the quick work of the maintainers ensured their supply chain was not compromised by this breach.
Projects maintained by volunteer developers doing part time work are less likely to be able to detect these sorts of incidents, and may not be able to prepare a timely response. Thanks to having the financial incentives in place that made it possible for maintainers to allocate time, complete the necessary tasks to resolve the issues, and minimize the downstream impact on the same day the breach was announced.
In the next post we’ll continue to dive deeper into how the urllib3 team establishes account security, fine tunes process, and more. If you’d like to get notified as future posts come out, please sign up for our blog digest here. Or if you don’t want to wait to finish this story, download the full whitepaper today!