<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=420236&amp;fmt=gif">

Our third Libraries.io open data release has arrived

Andrew Nesbitt
by Andrew Nesbitt
on March 14, 2018

Today we’re publishing another Libraries.io open data release with almost 400 million rows of metadata about open source projects and the network of dependency data that connects them all.

Three months ago we published our second open data continuing our collaboration with the Alfred P. Sloan and Ford Foundations. The data supports academics researchers looking into trends in software development, and developers to seeking to understand how their software is used more effectively than ever before.

Meanwhile, the Libraries.io dataset continues to grow in size. Today we’re releasing data on:

  • 35 package managers
  • 2.6 million projects
  • 12.1 million versions
  • 73 million project dependencies
  • 33 million repositories
  • 235 million repository dependencies
  • 11.5 million manifest files
  • 50 million git tags

The data is available in its raw format on Zenodo and we’re working on getting it published as a structured, queryable dataset on Google’s BigQuery. If you’d like to build tools on top of the most recent data, or top up your dataset to keep it current, check out the Libraries.io REST API.

For further documentation, check out our dedicated open data page. Also check out the article Ben wrote for opensource.com to get more ideas of things you can do with the data.

This data is published under a Creative Commons BY-SA-4.0 licence. It’s an open and free licence that commits the user to redistributing their work, and their understanding. And don’t forget, Libraries.io itself is open source, so if you’d like to get involved we can help you get started—check out the Contributors Handbook.

Finally, if you’d like regular updates from Tidelift on news like this, sign up here.
2018 open source survey results