Today we’re publishing another Libraries.io open data release with almost 400 million rows of metadata about open source projects and the network of dependency data that connects them all.
Three months ago we published our second open data continuing our collaboration with the Alfred P. Sloan and Ford Foundations. The data supports academics researchers looking into trends in software development, and developers to seeking to understand how their software is used more effectively than ever before.
And since this last data release, Tidelift has also launched a free dependency analysis service, which makes full use of this extended dataset. If you are interested in better understanding your current dependencies, you can log in with GitHub and get a bird's eye view of all of your open source usage.
Meanwhile, the Libraries.io dataset continues to grow in size. Today we’re releasing data on:
- 35 package managers
- 2.6 million projects
- 12.1 million versions
- 73 million project dependencies
- 33 million repositories
- 235 million repository dependencies
- 11.5 million manifest files
- 50 million git tags
The data is available in its raw format on Zenodo and we’re working on getting it published as a structured, queryable dataset on Google’s BigQuery. If you’d like to build tools on top of the most recent data, or top up your dataset to keep it current, check out the Libraries.io REST API.
This data is published under a Creative Commons BY-SA-4.0 licence. It’s an open and free licence that commits the user to redistributing their work, and their understanding. And don’t forget, Libraries.io itself is open source, so if you’d like to get involved we can help you get started—check out the Contributors Handbook.Finally, if you’d like regular updates from Tidelift on news like this, sign up here.