Congratulations, you’ve just inherited a legacy codebase!
That’s right, you’ve joined a new company, or taken on an existing project, and are navigating the quagmire of inconsistent coding styles, unused code, duplicated functionality, and the code smells that come with the years. Or maybe you’ve decided to refactor an application you yourself started many moons ago. In either case, you’d be forgiven for regretting some life decisions.
But fear not, we’ve got your back! Let’s take a journey through a case study of someone who’s been through this before, and then on to some actionable tips and tools to help make your life easier, starting today.
How it starts
Let’s face it - we’ve all come across some code, comments, or documentation that’s caused us to stop and wonder… “what were they thinking?” And our fictional Byron below meant no harm, I’m sure. But rather than offer to make himself available for eternity to help with this function, I’m sure we’d all prefer to see something more sustainable.
And so it falls to us, the inheritors of such technical debt, to make the code better. Not for the computer, as computers don’t care whether code is clean or dirty. No, my friends, we take on this challenge for the next developer who has to contend with the code base, noting full well that this could be our future selves.
Case Studies
To better understand how legacy codebases have been successfully managed, looking into case studies from the past can provide valuable insights. One such case study is that of the refactoring project for Cooleaf, a US-based company that provides a tool for companies to manage their employee engagement and recognition initiatives. (Note that we have no association with them - they were just one of the first relevant articles I found while searching the internet.)
The refactor was undertaken by the original developers, a Polish software development company, and in writing up their learnings, they noted that every software application is going to turn into legacy code. The mere act of making changes to add features or fix bugs means that the current version of the application is impacted by choices made in the past.
Further, given that teams change over time, those choices might have been made by people who are no longer around, impacting institutional knowledge about the codebase. This makes the codebase harder to work on, while the application itself becomes less modern as it’s based on technology and practices that are becoming outdated or obsolete. This, in turn, can lead to a tendency for the application to run less efficiently than it could, and as it evolves, it continues to become even bigger and scarier to work on.
You might argue that it becomes a vicious and inevitable cycle of software quality degradation.
To Refactor, or Rewrite from Scratch?
The truth is, you’ll probably find value in doing a little of both, but there are some factors you’ll need to consider in any event.
Refactoring legacy code is time-consuming, but perhaps not as time-consuming as rewriting from scratch. You see, if you start a rewrite, you have to get feature parity before you can release, and that could take months or years, by which time your customers will have expected new features and bug fixes. This creates a cat-and-mouse game of working on both the legacy and new app at the same time, the latter always playing catch-up.
So perhaps, because of this, you choose the refactor path. But going through all the code and converting it to new standards and conventions could be just as large a task, especially when you remember you’ll have to deal with dependency hell as you navigate the mammoth task of upgrading packages and finding alternatives for those that have long since become deprecated.
A Middle Ground
The key to maximising success in a refactor is to ensure you have the highest possible test coverage so that you can test your changes immediately. This approach presents a challenge for untested legacy projects with monolithic files. The key here is not to see this as some deadlock situation but as an opportunity to gradually build your test coverage. Start by integrating tests for the most critical functionalities before embarking on any changes.
By doing this, any changes to the newly tested code will not result in changes to its functionality, and you can work with confidence knowing that you’re not creating more work for yourself. This strategy will pave the way for a Clean as You Code approach. Here at Sonar, we make it our mission to make it as easy and as unobtrusive as possible for you to write clean code, and we don’t believe you have to do it all at once to call it a success.
In order to help explain why, let’s take a look at the age of the code in SonarQube between the years of 2010 and 2022. You can quickly see that by mid-2022, more than 50% of the codebase consisted of code written in 2018 and beyond and that there were few to no lines of code from before 2014.
You can also see in the following graph that we made quite some changes in 2015. Imagine if, before doing so, we’d refactored all of the code first. We’d have touched approximately 200 thousand lines of code, one-third of the codebase. Right before we threw them out completely.
And our codebase isn’t unique in this. In 2016, Erik Bernhardsson devised a tool (which was used to generate the graphs in this post) to measure the half-life of code. Did you know that Erik discovered the half-life of a “somewhat randomly selected” (his words) group of open source projects to be 3.33 years?
That means on average, one of those projects will lose half its code every 40 months! And depending on the project, it can be even more drastic. I highly recommend reading his write-up on this if you’re after more details, like the fact that, at the time, the Angular project had a half-life of just 17 weeks!
Here’s the graph of those randomly selected projects, which include angular, Kubernetes, react, rails, git, and Linux, with the red line indicating the average:
Tie it Together with DevOps
Now that you’ve heard the case for why and how Clean as You Code works, you’re now on track to iteratively improve your codebase. But rather than make this a manual process, you can leverage the principles of DevOps to make your life so much easier.
This one’s probably obvious, but I’d take a small wager that most legacy applications don’t have the thorough test suites you’d likely see in a modern application. Make sure you add tests to your existing code so that you can detect issues early. This might seem easier said than done, so consider starting with approval tests which verify that a piece of code produces the expected output. They are particularly valuable in refactoring scenarios where the goal is to change the structure of the code without altering its functionality. They provide a safety net to ensure that changes do not unintentionally affect the output.
With these tests in place, you can start the process of continual improvement of the quality of your code. Install SonarLint in your IDE for starters, and then try out SonarQube or get started with SonarCloud and integrate them into your continuous integration and continuous deployment (CI/CD) process.
Even though SonarLint, our clean code linter, isn’t technically a DevOps process component, when you set it up in “connected mode”, it will communicate with your SonarQube or SonarCloud instance and make sure you’re made aware of any issues as soon as possible. It’s invaluable in shortening that continuous improvement feedback loop and making sure you can be aware of problems while the code’s still fresh in your head.
Tackling legacy code effectively hinges on integrating DevOps practices - refactoring judiciously, utilising automation for testing and integration, and embracing incremental changes. These steps streamline code management and set a foundation for continuous improvement.