As part of our ongoing work on Libraries.io, we are glad to announce the availability of an updated data set. The new data set captures the state of open source metadata and the graph of dependencies as of the end of 2018. This data is available today as a set of CSV files that you can analyze.
Today’s data set includes information on over 16 million versions of 3.3 million open source packages. These packages are being tracked from 37 different package managers as well as information about repositories on GitHub, GitLab, and Bitbucket.
Analyzing the data using a data analytics tool like Google’s BigQuery allows you to look up and find things such as:
-
There are almost twice as many packages released on any given weekday compared to any given weekend day.
-
Despite the default license for npm modules created with `npm init` being ISC, there are more than twice as many MIT licensed npm modules as ISC.
-
Only 2.1% of all dependencies used by npm packages are pinned to the most recent release.
More documentation on the structure of the data can be found on the release page. Note that the data is available under a Creative Commons BY-SA-4.0 license. We would love to see and hear about any interesting things that you find in the data. Let us know by tagging @librariesio on Twitter.