Lift a Project   Sign up with Github

A brief history of package management

Jeremy Katz
by Jeremy Katz
on December 19, 2017

Application developers today are used to relying on and pulling in a number of open source libraries to help them focus on the functionality that’s important to their business. Rather than requiring you to find and download each of those libraries individually, though, most programming language ecosystems have a standard (or defacto standard) package manager that helps you to install and manage those libraries.  

For Ruby, that’s RubyGems. For Python, there’s PyPi.  In Java, it’s Maven. JavaScript? NPM.  The list goes on. Libraries.io provides a great way to see this information for any of 36 package managers.

But if we step back in history to a time before any of these modern package managers existed, there was an earlier form of package management for Linux. In fact, Linux distributions have provided package managers for nearly 25 years now.

packagemanagement

 

Linux and the need for a package manager

As the Linux kernel began to see broader usage, people wanted and needed to have more than just an operating system kernel. At first, it was the basics such as a shell; utilities like cut, sed, and awk; and an editor such as vi or emacs. Though you could start with the source code for these components and build them yourself (presuming, of course, you had somehow gotten a compiler), this added to the difficulty of users getting started with Linux.

Thus, in 1993, the earliest examples of what you might call a package manager began to appear. Amazingly, some of these early package managers live on today. Debian still uses dpkg and Red Hat still uses rpm (which was a successor to pms).

These tools were simple, but allowed you to download a pre-built binary package that could be installed, upgraded, and removed. And much like today, these early package managers also added the concept of encoding information about the other software they required as dependency metadata.

At the time, those packages existed in isolation. You could download one (or transfer it on a floppy disk!) but it wasn't easy to install sets of them. Only with the release of Debian’s apt-get in 1998 and Red Hat’s up2date in 1999 could you begin to easily download and install a package and all of its dependencies without explicitly specifying them all.

This is where the pattern of a file that you could download with information about a universe of packages was born. It also included a dependency resolver so that you could easily know that, for example, the dependency libpng.so.0 was provided by the libpng0 package. This was a huge step forward in terms of usability but also added complexity as norms around how to package large amounts of software were created and encoded in things like the Debian developer’s reference.

Early on, these system package managers were also used to provide packages for dependencies of various language ecosystems; you could (and still can) use rpm and yum to install things like the python requests package or rails. But it meant that everything on your system had to use the same version of the library, and, if the library was used by things shipped in the Linux distribution, those had to as well. This was easy for packages that weren’t changing quickly. But it became much more difficult as applications increased both in complexity and pace of development.

It was against this backdrop that many of the application package managers we use today were born. One of the earliest was CPAN for Perl, followed by Maven in Java and many others.

The role of the modern package manager

Today, these application package managers sit beside your system package manager and frequently allow you to have different versions of the library for different applications you’re working on. This decoupling makes development easier and reduces some of the compatibility burdens of the past.

And yet, it also adds new challenges. Now, if there is a security issue in one library, you may have many places where it needs updating. This has also led to an explosion of small packages. For example, the median size of a python package in PyPi is just 16 KB.

With all of these dependencies, there’s a critical need for tools to help you understand more about all of the libraries your application uses. This challenge — building the tooling to help both users and creators of open source — is one of the things I find most compelling about what we are working on here at Tidelift.

We’ll continue to share more of our thinking over the coming months. If you are interested in following along, consider signing up for our mailing list or following us on Twitter.

And if you want to hear from personalities involved in many of today’s package managers, check out The Manifest, a podcast all about package management that is co-hosted by our colleague Andrew Nesbitt!