In modern application development, open source is everywhere. In fact, 92% of professional application teams report that they leverage open source components in their applications.
These packages regularly get downloaded to all of our developer machines and production servers. Sometimes we take for granted that these are “good” packages. But how do you know that these components are truly the code the creators published and not the work of a malicious actor?
If you just trust PyPI or RubyGems.org to make sure the packages you’re pulling are coming from the right source, I have some bad news for you. While the package repositories do provide end-to-end encryption while downloading your packages, there is no guarantee that the code they are hosting came from the original developer.
There are many supply-chain attack types—take a few of these more well-known examples: an account takeover, a typo-squatting attack, and a malicious package masquerading as a useful package. Furthermore, if a maintainer’s credentials are stolen, or the publisher machines are compromised, how will you know that the code you are running came from the original creator? Verification of the upstream maintainer is vitally important to ensure you are receiving the original author’s code.
Several languages have built-in support for some forms of package authenticity verification. Here’s how they work:
Ruby has a trust policy option when installing gems by using the “-P” flag like so: gem install GEMNAME -P { HighSecurity | MediumSecurity }. Of course, this requires the author of your gems to have signed them in the first place. Another issue is that the author’s public keys tend to be stored in the repo, which can be compromised.
Another way to increase confidence in the package would be to look at the checksum instead of the signature. In Ruby this looks like: ruby -rdigest/sha2 -e "puts Digest::SHA512.new.hexdigest(File.read('gemname-version.gem')). In some rare cases, the maintainers actually publish the hash of the released package as part of the documentation. The ruby-lint project releases the SHA512 for each of their releases right in the source code. According to the maintainers of the project: “these checksums do not prevent malicious users from tampering with a built Gem; they can be used for basic integrity verification purposes.”
Google is currently maintaining a database of checksums and mirrors for Go modules. Due to the sizable investment Google is putting into the project and the early age of the language, it seems that the .sum file pattern might be adopted in other ecosystems. That said, this service still skips the actual human verification of the maintainers uploading the source code and binaries.
So how might we verify our components end to end?
There are tools for cryptographic verification options, of course. However, we need a trusted central source which we can use to verify the source as a known good actor.
In the world of SSL certificates, we have certificate authorities to act as those entities (and for PGP, the web of trust). However, for open source packages, we have neither a central verifiable source nor a scalable approach. Per the Ruby Security Documentation:
Sure, the certificate says Yukihiro Matsumoto, but how do I know it was actually generated and signed by Matz himself unless he gave me the certificate in person? […] Having to constantly add new trusted certificates is a pain, and it actually makes the trust system less secure by encouraging RubyGems users to blindly trust new certificates.
Organizing this type of work would require an organization to physically or digitally verify the identities of each maintainer manually.
That’s where the Tidelift Subscription steps in. When a maintainer applies to “lift” their package on the platform, we individually verify each maintainer is actually who they say they are. We also ask them to implement security best practices. This means that a package that is “lifted” as part of the Tidelift Subscription has a real, verified maintainer behind it, not a malicious actor.
Why haven't signatures been more broadly adopted already? There are a few reasons, but mostly comes down to two points: 1) it's too inconvenient for maintainers to do for free; therefore, 2) nobody can check signatures since the check will always fail.
When push comes to shove, we as engineers are responsible for sourcing and verifying the packages we pull into our projects. While there isn’t yet a fully end-to-end way to cryptographically verify all of our open source components, we are making progress. At Tidelift, we hope that by engaging directly with the maintainers on an end-to-end approach to verifying components we might change the landscape toward a safer, more secure open source supply chain.