Thursday, July 25, 2013

Maintaining Software Sucks – and what we can do about it

If you ask most developers, they will tell you that working in maintenance sucks. Understanding and fixing somebody else’s lousy code is hard. It’s tedious. And it’s frustrating – because you know you would do a better job if you were given the chance to do it over and do it right.

I enjoy maintaining code I've built. It’s my personal responsibility to my users. On the other hand, maintaining code I didn’t build – and living with the consequences of decisions I had no hand in making – is very painful for me.
Jeff Atwood, Programming Horror

Many software developers see maintenance as a prison sentence – or at least a probationary term. They went to school to become a software engineer, and instead they ended up working as a janitor. For “real developers”, maintenance is boring and “spirit-crushing”.

I have done software maintenance in the past and it is honestly the biggest pile of shit I have ever undertaken.
Comment by hotdog, on Jeff Atwood’s blog post The Noble Art of Maintenance Programming, October 2008

Maintenance is hard work

Robert Glass, in his essay Software Maintenance is a Solution, not a Problem, explains why maintenance is so difficult and unpopular. Maintenance is:

  • Intellectually complex – it requires innovation while placing severe constraints on the innovator. Maintainers have to work within the existing design as well as within regulatory and governance restrictions, they have to be careful to maintain compatibility with other systems, and they can’t afford to make mistakes that could impact customers who are already relying on the system (which is why more than half of maintenance work is testing).
  • Technically difficult – the maintainer must be able to work with a concept and a design and its code all at the same time, and pick it all up quickly.
  • Unfair – the maintainer never gets all the things the maintainer needs. Like good documentation, or good tools, or the time to get work done properly.
  • No-win – the maintainer only deals with users who have problems, instead of hobnobbing with executive sponsors on high-profile strategic projects.
  • Dirty work – the maintainer has to work at the grubby level of detailed coding and debugging, instead of architecture and creative design, worrying about how to build and deploy the code and making sure that it is running correctly.
  • Living in the past – maintainers are always living with somebody’s design and technology choices and mistakes, instead of solving new problems and learning new tools (which means there’s no chance to pad their resumes).
  • Conservative – the going motto for maintenance is "if it ain’t broke, don't fix it”, so maintainers have to put up with shortcomings and shortcuts and compromises, and do the least amount possible to get the job done.

But even though this is hard work, people doing maintenance work are under appreciated – and usually under paid too. The high-profile “software architect” positions with the high salaries go to the flashy stars working on strategic development projects. So why should you – or anyone – care if maintenance work is offshored to India or Romania or somewhere else?

Maintenance is too important to suck

But the truth is that maintenance is too important to be ignored – or offshored. It’s too important to organizations and too important to software developers. Maintenance – not new development – is where most of the money is spent on software,and like it or not, it is where most of us will spend most of our careers, whether we are working on enterprise systems, or building online communities or working at a Cloud vendor or a shrink wrap software publisher, or writing mobile apps or embedded software.

In the US today, 60% of software professionals (1.5 million of an estimated total population of 2.5 million) are working on maintaining software systems (Capers Jones, Software Engineering Best Practices, 2010), and the number of people working on maintenance worldwide will continue to climb because we keep writing more software, and this software all needs to be supported and maintained. We can look forward to a future where almost all of us will spend almost all of our time in maintenance, rather than building something new.

So we have to change the way that people think about software maintenance, how it is done, and especially how it is managed.

It’s not Maintenance that Sucks. What Sucks is how we Manage Maintenance

Attempts to manage maintenance using formal governance models based on ITIL service management or structured software lifecycle standards like IEE-STD-1219 (ISO/IEC 14764) or applying the CMMi maturity model to software maintenance are hopelessly expensive, bureaucratic and impractical.

In most organizations, maintenance is handled reactively, as an unavoidably necessary part of business-as-usual operational support. What management cares most about is minimizing cost and hassle. What is the least that we have to do to keep users from complaining, and how can we get this done in the cheapest and fastest way possible? If we can’t avoid doing it, can we outsource it, maybe save some money, and make it somebody else’s problem, so we can focus on other “more important” things?

This short-term thinking is continued year after year, until the investment that the organization made in building the system, and in the people who built it, has been completely exhausted. The smart people who once cared about doing a good job left years ago. The technology has fallen behind to the point that it has become another “legacy system” that will need to be replaced. Only a handful of people understand the code and the technology, and each change takes longer and costs more than it’s worth. Technical risks and business risks keep piling up. Your customers are tired of putting up with bugs and waiting for things that they need. At some point there’s no choice but to throw it out and start over again. What a sad, sick waste of time and money.

We have to correct the misunderstanding that software maintenance is a drain on an organization, that it takes away from innovation: that innovation can only happen in expensive greenfield development projects, and that money spent working on software that you already have written is money wasted. That it isn’t important to pay attention to what your business is doing today and that you can’t build on this for the future.

Online, software-driven businesses like Facebook and Twitter and Netflix keep growing and innovating by building on what they already have, by investing in their people and in the software base that they have already written. A lot of the work that people are doing at these companies today is maintenance: adding another feature to code that is already there, fixing bugs, making the system more secure, scaling to handle increased load, analyzing feedback data, integrating with new partners, maybe rewriting a mobile app to be faster and simpler to use.

This is the same kind of maintenance work that other people are doing at other companies, and at the heart of it, the technology isn’t that much more exciting. What’s different is how their work is done, how it is managed, and how people approach their jobs. For them, it isn't “maintenance” – it’s engineering. Developers are valued and trusted and respected, whether they are building something new or supporting what’s already there. They are given the responsibility and the opportunity to solve interesting problems, to experiment and try out new ideas with customers and to see if their ideas make a difference. They work in small teams following lightweight incremental development methods and Continuous Integration and Continuous Delivery and Continuous Deployment, and other ideas and self-service tools coming out of Devops to get things done quickly and efficiently, with minimal paperwork and other bullshit.

These organizations show how software development and maintenance should be managed: hire the best people you can find; grant them agency and the opportunity to work with customers and make a meaningful difference; provide them with constant feedback so that they can keep learning and making the product better; give them the tools they need and encourage them to keep improving the software and how they build it; and hold them accountable for the system’s (and the company’s) success.

This works. What other organizations are doing doesn't. It’s not maintenance that sucks. It’s how we think about it and how we manage it that sucks. And this is something that we can fix.

Thursday, July 18, 2013

Agile Development leads to Alzheimer's

Iterative development and design helps you to reach your way towards understanding what the customer really needs, to try out new ideas, evaluate designs, experiment, respond to feedback and react to changing circumstances. Everything gets better as you learn more about the domain and about the customer and about the language and technologies that you are using. This is important early in development, and just as important later as the product matures and in maintenance where you are constantly tuning and fixing things and dealing with exceptions.

But there are downsides as well. Iterative development erodes code structure and quality. Michael Feathers, who has been studying different code bases over time, has found that changes made iteratively to code tend to bias more towards the existing structure of the code, that developers make more compromises working this way. Code modules that are changed often get bigger, fatter and harder to understand.

Working iteratively you will end up going over the same ground, constantly revisiting earlier decisions and designs, changing the same code over and over. You’re making progress – if change is progress – but it’s not linear and it’s not clean. You’re not proceeding in a clear direction to a “right answer” because there isn’t always a right answer. Sometimes you go backwards or in circles, trying out variants of old ideas and then rejecting them again, or just wandering around in a problem space trying stuff until something sticks. And then somebody new comes in who doesn’t understand or doesn’t like the design, tries something else, and leaves it for the next guy to pick up. Changes in design, false starts, dead ends and flip flops leave behind traces in the code. Even with constant and disciplined refactoring, the design won’t be as clean or as simple as it would be if you “got it right the first time”.

It doesn’t just wear down the code, it wears down the team too

Iterative development also has an erosive effect on an organization’s memory – on everyone’s understanding of the design and how the system works. For people who have been through too many changes in direction, shifts in priorities and back tracking, it’s difficult to remember what changed and when, what was decided and why, what design options where considered and why they were rejected before, what exceptions and edge cases came up that needed to be solved later, and what you need to know when you’re trying to troubleshoot a problem, fixing a bug or making another design change.

Over the course of the last 6 or more years we've changed some ideas and some parts of the code a dozen times, or even dozens of times, sometimes in small, subtle but important ways, and sometimes in fundamental ways. Names stay the same, but they don’t mean what they used to.

The accumulation of all of these decisions and changes in design and direction muddies things. Most people can keep track of the main stories, the well-used main paths through the system. But it’s easy for smart people who know the design and code well, to lose track of details, the exception paths and dependencies, the not-always-logical things that were done for one important customer just because 25 or 50 or 110 releases ago. It gets even more confusing when changes are rolled out incrementally, or turned on and off in A/B testing, so that the system behaves differently for different customers at different times.

People forget or misremember things, make wrong assumptions. It’s hard to troubleshoot the system, to understand when a problem was introduced and why, especially when you need to go back and recreate a problem that happened in the past. Or when you’re doing trend analysis and trying to understand why user behaviour changed over time – how exactly did the system work then? Testers miss bugs because they aren't clear about the impact of a change, and people report bugs – and sometimes even fix bugs – that aren't bugs, they've just forgotten what is supposed to happen in a specific case.

When making changes iteratively and incrementally, people focus mostly on the change that they are working on now, and they forget or don’t bother to consider the changes that have already been made. A developer thinks they know how things work because they’ve worked on this code before, but they forget or don’t know about an exception that was added in the past. A tester understands what needs to be tested based on what has just been changed, but can’t keep track of all of the compatibility and regression details that also need to be checked.

You end up depending a lot on your regression test suite to capture the correct understanding of how the system really works including the edge cases, and to catch oversights and regression mistakes when somebody makes a fix or a change. But this means that you have to depend on the people who wrote and maintained the tests and their understanding and their memory of how things work and what impact each change has had.

Iterative development comes with costs

It’s not just the constant pace, the feeling of being always-on, always facing a deadline that wears people down over time. It’s also the speed of change, the constant accumulation of small decisions, and reversing or altering those decisions over and over that wears down people’s understanding, that wears down the mental model that everyone holds of how the system works and how the details tie together. All of this affects people’s accuracy and efficiency, and their confidence.

I am not sure that there is a way to avoid this. Systems, teams, people all age, and like in real life, it’s natural that people will forget things. The more changes that you make, the more chances there are for you to forget something.

Writing things down isn't much of a help here. The details can all be found somewhere if you look: in revision history, in documentation, in the test suite and in the code. The problem is more with how people think the system works than it is with how the system actually works; with how much change people can keep up with, can hold in their heads, and how this affects the way they think and the way they work.

When you see people losing track of things, getting confused or making mistakes, you need to slow down, review and reset. Make sure that before people try to fix or change something they have a solid understanding of the larger design – that they are not just focusing on the specific problem they are trying to solve. Two heads are better than one in these cases. Pair people up: especially developers and testers together, to make sure that they have a consistent understanding of what a change involves. Design and code reviews too, to make sure that you’re not relying too much on one person’s memory and understanding. Just like in real life, as we get older, we need to lean on each other a bit more.

Site Meter