<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=420236&amp;fmt=gif">

How to choose open source packages well

Jeremy Katz
by Jeremy Katz
on September 27, 2018

We often don’t stop to think about it, but selecting the right open source package is not always a straightforward process. And for developers who started with .Net or Java, it can be even trickier.

The tremendous growth of scripting languages like Python, Ruby, and JavaScript makes package research and selection a major part of the professional developers’ job. Today, a JavaScript developer using the popular Node.js runtime on the backend, for instance, gets very little functionality out of the box. This can be great for learning the core of the environment, but it also means choosing the best JavaScript framework is hard when you have so many package choices to make.

So how do you choose well?

In this blog post, we will provide an overview of the key steps to making the best possible choices when considering open source packages.

Few large decisions, many small decisions

It’s important to first distinguish between the relatively few large technology decisions a team makes and the large number of smaller decisions.

Infrastructure decisions

This category includes things like selecting a cloud provider, a database, or language. These are big decisions that involve many people and have company-wide impacts. At a previous company, for example, we decided to switch cloud provider—and it was a project which took most of a year. Similarly, changing out your database often requires a rewrite of the entire data model. Switching from SQL Server and .NET to MySQL and Java in a previous role was also a project that took about a year to complete. So these decisions often get a lot more oversight and attention from the senior members of a technical team to avoid a need to revisit them at a later point in time.

Package decisions

Far more common—and the focus of the rest of this post—are package decisions. These are the every day and every week decisions developers make in order to leverage open source to get their app to market quickly and cost-effectively.

This includes things like a package to ease the implementation of a navigation breadcrumb functionality for your frontend, selecting a database connector, or picking among many libraries to talk to a third-party API. While the impact of these decisions is typically constrained to the development team, they are far from inconsequential—and the cumulative impact of the decisions over time can be substantial. Getting them wrong can result in significant productivity drains, mounting technical debt, or even legal or security exposure.

While not taught in any computer science class, researching and selecting packages has become a significant part of professional developers’ job.

Without further ado, let’s get into some best practices.

Step 1: Build your shortlist

Coming up with a shortlist of candidate packages to explore more deeply is the first step. The shortlisting process will vary by language. Python and Ruby, for example, tend to have more built-in than say JavaScript.

The Django Python web framework, for instance, comes with a built-in Object-Relational Mapper (ORM) to connect your app to a database. Other ORMs exist, but using them isn’t easy so you typically will use the built in tool unless you have a really specific and important requirement.

Whether you are a Java, Ruby, Python, or JavaScript shop, each language has at least one package manager containing tens to hundreds of thousands of packages each. These package managers are a great place to start looking for candidate packages to meet your needs.

Past experience with packages and modules will play into your shortlisting. If you’ve used something before and liked it, it would obviously make your shortlist. Equally, we all tend to rule out packages with which we’ve had a negative experience.

You will also want to consider the user experience impact of package choices. Something might meet all the functional requirements but require additional effort to feel smoothly-integrated into your product or it might just be slow! And let’s face it, “a little funky” is not how most developers want to describe the application that they’re building.

Asking other developers about their experience with a package is also important. For all the talk about social coding these days, the truth is that coding has always been social and developers have always turned to one another for advice when making tech decisions. For decades, we’ve used IRC and many developers, including myself, still do.

Also popular today are sites like StackOverflow, Reddit, bespoke slack channels, email lists, and Twitter. And while Google is not itself a quality indicator, it points to things that are, like blogs and threads on the aforementioned platforms.  

When evaluating a package, it’s important to also be sure that you consider various non-functional requirements, such as the license. It’s surprising how frequently developers skip this step. Using a package with no license or with anything other than Apache, MIT, or maybe LGPL can become a problem.

At the end of your shortlisting process, you may have anywhere from three to five candidate packages on the high end, and one or two on the low end.

Keep in mind that for any nontrivial application you’re developing, you’ll be going through this shortlisting process on a regular and ongoing basis as you continue to extend the functionality.

Step 2: Assess the quality of shortlisted candidates

Now comes the fun part—figuring out which of the shortlisted candidates is the “best.” Often, developers go into this step with a clear favorite. In this case, the activities described below can be thought of as sort of a pre-flight checklist. Only a major alarm would prevent moving forward with the preferred package.

The first thing to do is go find where the package is developed, which in most cases is GitHub. Then I like to determine if the package is hosted on an individual’s GitHub repo or if it’s set up as it’s own organization. Having it hosted by an organization is usually a signal that there is more than one maintainer or major contributor. This helps to avoid problems if someone no longer has the time or interest in a project.

Next you want to look at activity. When were the most recent changes? What’s the release process look like? On GitHub, it’s easy to see when the last commit was made. But this doesn’t necessarily equate to a project being well-maintained. Committing a PR isn’t the same as making a release.

You really don’t want to depend on a package with a lot of unreleased changes because it makes it harder to get fixes into your application if you run into a problem in the features.  

With the free tiers of Travis and CircleCI, more open source packages are adding Continuous Integration. I usually consider this a bonus, not a must-have.

Next you want to look at the issues. When looking here, many people only look at open issues. But I actually think you learn more about a package from the closed issues. Pay attention to how maintainers respond to issues. If their attitude tends to be “you’re wrong, go away”, that should be a big red flag. What you want to see are positive / productive responses. That’s how a maintainer grows contributors, which in turn promotes project health.

In addition to the tone of responses, what’s the cadence? Are they batch responding, plowing through dozens of issues in a day and then letting them stack up for a month? Or do they seem to have a regular, more systematic approach?

You’ll also want to look at how many contributors there are. Consider the size and popularity of the package and its importance to your stack. If it’s small, niche or easily swappable, you shouldn’t necessarily rule it out if it has a sole maintainer.

You will often come out of this evaluation process with red flags on some or all your candidates. It then becomes a case of choosing the lesser of the evils. The comment below from Tidelift’s internal slack provides a perfect example of just this scenarioone that plays out regularly across every professional development team.

screenshot-slack

In a follow-up blog post, I will address best practices and considerations for selecting from imperfect options.

In the meantime, there are a couple resources from Tidelift that can help.

  • Libraries.io: Our Libraries.io project, which indexes data from 3.2 million packages from 36 package managers. Libraries.io monitors package releases, analyses each project's code, community, distribution and documentation, and maps the relationships between packages when they're declared as a dependency.
  • Open source assessment: You can also set up a Tidelift open source assessment to uncover important information about your stack, including:
    • Deprecated and unmaintained packages
    • Missing licenses
    • Security vulnerabilities
    • Direct and transitive dependency trees
Request an assessment
2018 open source survey results