Today we’re publishing another Libraries.io open data release with over 311 million rows of metadata about open source projects and the network of dependency data that connects them all.
Six months ago we published our first open data as part of our commitment to the Alfred P. Sloan and Ford Foundations. The data supports academics looking into trends in software development, investors to understand the success of projects they support, and developers to understand how their software is usedmore effectively than ever before.
Last week we announced that Libraries.io has joined forces with Tidelift to make open source software work better for developers and users. Libraries.io’s mission hasn’t changed and we’re going to continue publishing open data releases every quarter to build a stronger, more informed open source ecosystem.
Since our last release the Libraries.io dataset has grown significantly, today we’re releasing data on:
34 package managers
2.7 million projects
11 million versions
66 million project dependencies
31 million repositories
161 million repository dependencies
10 million manifest files
46 million git tags
The data is available in its raw format on Zenodo and we’re working on getting it published as a structured, queryable dataset on Google’s BigQuery. If you’d like to build tools on top of the most recent data, or top up your dataset to keep it current, check out the Libraries.io REST API.
For further documentation, check out our dedicated open data page. Also check out the article Ben wrote for opensource.com to get more ideas of things you can do with the data.
This data is published under a Creative Commons BY-SA-4.0 licence. It’s an open and free licence that commits the user to redistributing their work, and their understanding. And don’t forget, Libraries.io is open source, so if you’d like to get involved we can help you get started—check out the Contributors Handbook:
Finally, if you’d like regular updates from Tidelift on news like this, sign up here.