test-driven development: refactoring, difficulties

liw recently wrote about test-driven development:

This all sounds like a lot of bureaucratic nonsense, but what I get out of this is this: once all tests pass, I have a strong confidence that the software works. As soon as I've added all the features I want to have for a release, I can push a button to push it out

To my mind, this is only one of the benefits. He doesn't describe another major benefit, which is the confidence with which you can take on re-factoring in projects with well-developed test infrastructure.

If your project has no test infrastructure at all, and you make a deep and/or potentially invasive change, you might well produce something that is heinously broken for users who have a different pattern of using the tool than you do.

But if you have a well-developed, reasonable test suite with fairly wide coverage, you can make a deep or invasive change, and be confident that -- if the tests all pass -- the stuff you release isn't going to be too horrific. And if you do break something with a change which the test suite didn't cover, that's an indication that the test suite is lacking. Hopefully, you can factor out a problem report into its own test, so that future changes will ensure that the behavior doesn't regress too.

The upshot of more-confident re-factoring is that your development can be much bolder, you can roll out new features more quickly, and you can spend less time agonizing about whether you've got the various abstraction layers exactly right the first time through. These are all good things (though i do think the agony of abstraction perfectionism is well-warranted in some contexts, like API definitions, and wouldn't want test-driven development to make people give up on that necessary task).

when testing can't cover everything

Some things are just hard to test well. for example, if you have 20 different boolean options to a command-line tool, you can't realistically test them in all combinations and permutations; that would be over a million tests.

User experience is also notoriously difficult to test, as are tools that rely heavily on network interaction, which can be flakey and unpredictable.

other downsides

Test suites themselves require maintenance, as the components they rely on can change over time. More than once, i've had a test suite failure that was really a failure of the test infrastructure, not the code itself. But in those same projects, that's usually followed or preceded by a test suite failure that picks out a particularly nasty or subtle bug in the tested code that might have persisted for quite a while unnoticed. So the test suite does in some sense create more work all around.

more and better testing is good for Debian

Even when we know that coverage isn't perfect, and even with its additional overhead, well-integrated tests (at the unit level and more generally) are worth the tradeoffs. It's worth doing because of the robustness and guarantees that we can give each other with regularly-tested code. It's a more stable foundation which surprisingly also gives us more flexibility going forward. This is good for free software in general. It also helps us find our bugs before our users do, so it's better for our users. So more and better testing directly supports the two main priorities outlined in the Debian Social Contract.