Bit Rot: the silent killer

Written by Tidelift | January 30, 2018

Your code is rotting right now.

Every day, each one of your production services, internal tools, and open source libraries decays a little bit. Each day getting closer and closer to breaking in ways you didn’t expect, even if you haven’t touched them in years, all thanks to bit rot.

What is bit rot?

Bit rot happens to all software when the dependencies and tooling required to build, test, and deploy it change over time. Eventually—when the software needs to be changed or redeployed—it cannot be returned to a functioning state because of conflicts with the changing ecosystem around it.

Software doesn’t exist in a vacuum. Applications are built on top of hundreds, even thousands, of different pieces from open source frameworks and libraries. They’re written in a range of programming languages, run on a variety of operating systems, and deployed to a vast array of hardware.

All of those components are updated and patched with varying frequency, sometimes requiring changes that break compatibility with other parts of your application stack. These updates and patches require a ripple of changes across the stack to keep everything running smoothly.

Sometimes you’re in control of when those changes are applied, the version of the programming language you use, for example. Sometimes you’re not so lucky. Even small changes can force you to make large, breaking changes across the application, just to keep things functional and secure. This affects projects that are under active development and those that have been quietly running away on a server for years, seemingly without issue.

Why does bit rot happen?

Bit rot is often a death by a thousand cuts; each piece of software you depend upon can be susceptible to any number of changes that can bubble all the way up to cause your application to break in weird and wonderful ways. Some of the most regular causes are things like:

Security releases that disable/change insecure interfaces
Bug fix releases that inadvertently cause API changes
Old versions being end-of-lifed and no longer tested for compatibility
Incompatible breaking changes in major releases
Conflicts within the dependency tree of your application
Unrepeatable installation steps stopping you from reproducing a working environment (also known as onceability)
Third-party or remote APIs changing or becoming unavailable without prior warning

The general rule of thumb is that the higher up the stack you’re working, the faster bit rot sets in. By breaking the usual software stack up into a few layers, you can see where the usual suspects start to show up.

Hardware
Operating system
VM/hypervisor
Container
System level dependencies
Programming languages
Application dependencies
Your application
End user client

Containers to the rescue

Often programs that have been freeze-dried in a Docker image or Linux container don’t see the effects of bit rot straight away. But realistically, these solutions just delay the inevitable.

One of the best things that Docker and the containerization movement has brought is repeatability: being able to take a snapshot of a machine image and reuse that across thousands of servers with the knowledge you’re getting exactly the same set of software across the whole stack, every time.

Repeatability can certainly help combat bit rot in the short term, but when security issues are found in any part of the software within the container, the whole container needs to be regenerated and that’s where you might run into problems.

The longer it’s been since that image was first generated, the more likely it is that there have been numerous updates to the dependencies of your application. And unless you’re using reproducible package managers at both the system and application level you’re likely to pull in those updated and potentially incompatibile new releases. At that point you are deep in dependency hell with no easy way out...

Slowing bit rot

Ultimately, unless you control the full stack—including the hardware—you’re never going to be able to completely prevent bit rot, but here are 5 steps you can take to slow its progression:

Keep dependencies up to date - More regular, small updates to dependencies can help you stay on top of security and bug fixes as well as keeping up with breaking changes.

Use lockfiles and container images to improve reproducibility - Lockfiles and container images allow you to record and reproduce the exact versions of every dependency in your app, meaning you’re in control of when new versions are introduced rather than whenever you next build or deploy.

Minimize excess dependency usage - The more dependencies your application has, the more chances that any one dependency could cause issues. Trim down those dependency trees by regularly checking for unused packages required by your app.

Write integration and unit tests - Having a good test suite of both unit and integration tests can give a lot more confidence that when things do change, the software still works as expected before being deployed.

Run tests regularly (even if code hasn’t changed) - Having a good test suite isn’t much use if you don’t run the tests. Schedule your tests to run automatically every week to get early warning signs about changes to system and OS level dependencies that might have changed and are silently being updated.

Overall, performing regular maintenance on your app and keeping its dependencies up to date will go a long way to fighting bit rot. Make a plan for long term maintenance processes of all your production apps because the longer you wait, the more likely dependency hell will rear its ugly head.

If you are interested in learning more about open source issues like those discussed here, consider signing up for updates or following us on Twitter.

View full post