Have you ever wondered what the open source maintainers that your business relies on do to keep our software healthy and secure? Here’s the second in a series of posts about urllib3, a Python package that is downloaded billions of times a year, and what it takes to keep it well maintained and up to date.
In the first post in this series, we discussed what urllib3 is, who maintains urllib3, how urllib3 maintainers handle CVEs, and why you should care. And if you’d like to read the entire case study, you can download the full whitepaper right now at the link below.
Securing their maintainers’ access
Seth Michael Larson, an urllib3 maintainer, once said on his blog:
Some time ago I was chatting with a friend about OSS supply chain security. During the conversation I mentioned that I'd prefer having my bank account compromised compared to my GitHub or PyPI accounts.
This isn’t hyperbole.
In late 2021, a Javascript maintainer had their NPM account compromised. An attacker was able to release versions of the ua-parser-js package that installed cryptocurrency miners on users' machines. ua-parser-js is depended on by nearly 4000 other JavaScript packages, and users of any of those packages were potentially left vulnerable to this attack until the malicious packages were discovered and pulled.
The urllib3 maintainers know how critical their software is to the Python ecosystem and the internet at large, so to prevent a similar situation, they have taken a number of steps to harden their accounts, ensuring they are properly secure and protected.
- Email: Using a sufficiently secure email provider; email accounts are where account resets happen, and are a key target for hijackers
- Passwords: Using a password manager; relying on strong auto-generated passwords stored securely rather than remembered in an ad-hoc fashion.
- Using two-factor authentication: for all services used email, source control, and package manager
Steps like these help prevent real attacks. To prove how important account security is, in 2022, a self-professed security researcher was able to hijack the CTX package by registering an expired domain used for the email on record for the Python maintainer. The attacker then used that domain to reset their package manager password and take over the package. They injected it with code that caused user’s environment variables, potentially containing passwords and tokens, to be sent to their servers.
Beyond these account security steps, the urllib3 maintainers have taken steps to secure even how their own maintainers access the repository and build software. Among the changes they have added:
- Separating permissions for their contributors between reviewers (can review changes), release managers (can perform releases), and owners (can set permissions and roles).
- Using GitHub’s CODEOWNERS feature to ensure that any changes that affect the build and release pipeline requires specific approval by the core maintainers.
- Using API tokens with limited scope for all build, test, and release processes, instead of using their elevated personal credentials. This use of least-privilege architecture limits the risk of any single credential compromise.
All these steps work to ensure that their accounts and the software itself is less vulnerable to hijacking and compromise, and every user benefits from this increased security.
Backwards compatibility
“When you’re at the bottom of the world, any change is gonna ruin someone’s workflow,” Seth said. “Our team is super cautious. It’s something we pride ourselves on.”
You may ask, “what does backwards compatibility have to do with security?” If a piece of software takes extreme care to maintain backwards compatibility, then that means that upgrades become more painless. Users can upgrade with confidence, and when you can upgrade with confidence, you can more easily pull in fixes and security remediations.
Let’s go back to the Log4Shell vulnerability. The vulnerability affected the 1.x series. While initial fixes were added for that series, going forwards, only Log4j 2 is maintained, which has a different API for developers to use. To stay current with Log4j, developers and users either need to migrate their code to a new API, or install an additional API bridge package. This hinders their ability to stay current and apply future security updates.
The urllib3 team works to maintain backwards compatibility—each change passes a rigorous test suite to ensure there are no unintended breaks to functionality. They go above and beyond to ensure that users will not have their workflow interrupted—they still support in their 1.x branch end-of-life Python releases such as 2.7 or 3.5. This ensures that any time the urllib3 team does have to release an important change that users will be able to quickly pick up a release.
The team is continuing this work as they prepare a new major release, 2.0, where they are retaining 99% functional API compatibility in the new version, with a goal of making it “the simplest major version upgrade you’ve ever completed.”
This policy ensures that all users have a smooth upgrade path to new releases, picking up security fixes when and where necessary.
Automating and streamlining release processes
Manual processes lead to mistakes. The more complicated the process, the higher the chance of mistakes. Releasing software is one of those complicated processes—from tagging the release in source control, to creating the release artifacts, to pushing to the package manager, there are many steps that are all often run with elevated privileges. And if any of them are done improperly, the released software may have the wrong code, credentials could be compromised, or more.
Seth was having “the worst possible day” one day, where everything was going wrong, when the time came to perform a urllib3 release. He realized that to make this better, the process had to become easier and more automated. Working with Quentin Pradet, another urllib3 maintainer, they did the work to build a checklist-based automated release process, with built-in ability to get approvals from maintainers. By automating the release process, the maintainers can be sure that the same process is run every time, and the software is built the same way every time. They can embed any credentials that are needed, and scope them to only the permissions that are required. They can hook in their continuous integration processes, to ensure that any software is fully tested before it goes out.
Now when releases are needed, any maintainer can contribute and suggest a release candidate. That candidate is then sent to CI and tested, and if it’s approved, the tag, build, and release process is all run automatically via GitHub actions. This ensures that the release process is safe, repeatable, and reliable, and that their users are protected from potential compromise and accidental oopses.
Reproducible and verifiable builds
The urllib3 team’s automation of their build process makes sure they know their builds are being done the same way every time. But to fully trust code, it’s good to have more assurances than that. You want to know that the artifacts are built from the code that they’re supposed to be built from. You want to know that if you build the same code, you’ll get the same artifact. If you can prove both of those items, you can have confidence to see the provenance of your code—from upstream source control, to the package manager, to your downstream environment.
In 2015 malware was discovered that attacked the Apple Xcode development platform. Infected versions of Xcode would inject additional malware silently into applications built by that version of Xcode. That malware ended up being released as part of multiple iOS applications.
By working to create reproducible builds with a chain of trust from source through the build system to the final artifact, the urllib3 team can bypass this class of attacks, and ensure that their build processes are not silently compromised.
The urllib3 team has done a large amount of work to ensure that this is possible for their code. In 2022 they moved to the Flit build system and adjusted their build process to ensure that every time a specific set of source code is built, it produces the same, byte-for-byte, artifact—all their builds are fully reproducible.
After ensuring their builds are reproducible, the maintainers moved to ensuring their builds are verifiable—that they can certify that each build came from the proper release branch and tag, and that nothing has been tampered with along the way.
First they integrated support for Sigstore to generate a chain of trusted signatures of their built artifacts, and then worked beyond that to perform provenance generation and verification via Supply Chain Levels for Software Artifacts (SLSA). urllib3 now supports being verified as compliant with SLSA Level 3, which states that the “source and build platforms meet specific standards to guarantee the auditability of the source and the integrity of the provenance respectively.” Downstream users can now be assured that the software they get is the software that they intended to get, and can begin to build their own internal guarantees based on this provenance that they get from upstream.
Deprecation of [secure] extension and making secure-by-default
In 2015, urllib3 introduced the ‘secure’ extension. This extension would install the ‘pyOpenSSL’ package and use it for connection security, rather than Python’s built-in ‘ssl’ module, due to additional features like SNI that pyOpenSSL supported at the time.
Downstream users such as ‘requests’ and other projects would enable ‘urllib3[secure]’ to get this extension. Some naive users would follow suit—after all, it says “secure” right there in the name.
Fast-forward to 2022, and Python’s built-in ‘ssl’ module contains all the needed features of ‘pyOpenSSL’ . While the extension was once used to ensure security, it’s now no more secure than the default, and presents an extra code path and set of dependencies that users will get—that can actually open them up to more vulnerabilities.
The urllib3 maintainers want to remove this extra extension, but responsibly so that their downstream users don’t break. They know the stress that inflicted change can be for other open source maintainers. In Seth’s own words:
“Most projects would not have done that but I wanted to avoid causing pain to downstream projects, and avoid barrages of questions from their users asking each project to remove the extra to stop deprecation warnings.
Many firestorms happen because of deprecations—projects can get blindsided sometimes. I didn't want to be the cause of that for anyone.”
This involves:
- Creating a new package that ‘urllib3[secure]’ depends on, that emits a deprecation warning to inform users to move away from it
- Have that deprecation warning point to a page on what’s going on, and how to fix it
- Track installation of this new package, to see how often people are using urllib3[secure] (and whether it goes down over time)
- Querying the Python package manager to find out which packages directly depend on urllib3[secure]
- Creating an issue for each package, and eventually pull requests to fix them
All of that is a lot of extra work for maintainers to take on, and that work takes time—which the urllib3 maintainers only have so much of. Yet they take this time to make sure that their downstream users don’t have to deal with pain and can continue to use and upgrade in confidence.
In the next post we’ll finish up with discussing lessons learned by urllib3 maintainers and how their work is made possible. If you’d like to get notified as future posts come out, please sign up for our blog digest here. Or if you don’t want to wait to finish this story, download the full whitepaper today!