But what impact does this have on your software development?
The spread of granular packages
As we can see in the boxplot graph below, many prominent package managers and open source communities are becoming more granular, as the distribution of their repository sizes is largely clustered below 1 MB, with the mean repository size often being close to 100 KB.
A quick refresher on boxplots: the colored boxes represent the middle 50% of repository sizes (from the first to the third quartile, also known as the interquartile range), with the interior line showing the mean repo size. The horizontal lines at the top and bottom of the vertical whiskers tell us the maximum and minimum repo sizes, and the dots above or below those lines show outliers in the data.
This graphic leads us to a number of interesting conclusions: for most package managers, individual packages do tend to be small, however we see huge diversity in the range of repository sizes—for example, Rubygems is tightly concentrated between 100 KB and below 1 MB repositories, but with a huge number of outliers.
We can also see some of the older package managers (such as Maven and NuGet) tend to have the largest repository sizes. Why is this? It’s hard to definitively say. Do these repositories simply contain more code? Or do they tackle more problems (within a single repo)—as opposed to the more task-specific granular projects we see today?
The scope of open source
All of this is to say that despite the “long tail” of open source, the rise of granular software packaging has resulted in a world where there are hundreds of thousands of packages that are actively used by professional software development teams. Managing this breadth of software can present challenges.
What does this mean for your software?
The act of releasing open source software in small packages has many consequences, both intentional and not. For example, small packages tend to be more specialized to a specific task, updated more often, and, generally, less complex.
They also introduce more potential points of failure into a build: you require more packages to build your application, and those require more dependencies of their own, introducing a complex dependency tree that could cause trouble for you as the end developer.
And should you decide to use a package that is a part of the long tail of open source, there is a potentially greater risk that the package becomes unmaintained, leaving you, as the user, in limbo.
A first step to caring for your team’s software is being aware of the potential unintended side-effects of modularity of open source software packaging; after that, there are a number of other directions we can take to help our open source code and community.
If you are interested in learning more, consider signing up for updates or following us on Twitter.