Documentation-Driven Development

David Jackson 2022-06-23

A quick note: Definitive statements in this article are my opinions and I've omitted writing "in my opinion/experience" in almost every sentence to save you and I the tedium.

...we have come to value: ...[w]orking software over comprehensive documentation

—Manifesto for Agile Software Development

This quote is responsible for some of the worst problems with modern software development. It's not really the quote itself's fault, it's more to do with the interpretation of the quote. In particular, the problem is with the interpretation that documentation is, at best, a "necessary evil." This has led to poorly-written documentation at best and completely-missing documentation at worst. That wouldn't be a problem, necessarily, if the software exhibiting this issue were itself "intuitive" or even "good" but most of the time bad documentation correlates to bad software.

Part of the reason that developers don't view documentation as being particularly important is that code itself is documentation. In fact, it is the most detailed possible documentation in that it explains precisely and unambiguously what the software does and how it does it. Other documentation is just for the users. The problem with this line of thinking, apart from it being condescending, is that it misses the forest for the trees on a number of levels.

Code is documentation. The "users" at which this documentation is aimed is the programmers who have to read it. You might think that the code is "for" the machine (to execute) but it's not. The machine needs that program first to be translated into machine code and then into voltages in order to be able to understand it. Code is for people, not machines.

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

—Martin Fowler, Refactoring (1999), page 15

The fact that code is written for people cannot be overemphasized. Given a scenario in which code could be made "readable" rather than anything else, readability should be prioritized until it creates measurable problems—e.g. performance, appropriate modularity, etc. The problem with the fact that code is written for people is the same problem that one has when one is writing prose: not everyone thinks the same way and the author may be misunderstood. Code is unlike prose in that it has an objective "meaning" if we consider its effect when run to be its "meaning." That "objective" meaning is often at odds with what programmers think the code "means." Bugs are what happens when a programmer fails to understand the "true" meaning of what they have written. In that way, code is very much like prose because is contains layers of meaning. What we programmers call "abstraction," a literature professor might call "metaphor."

Bryan Cantrill: Software has often been compared with civil engineering, but I'm really sick of people describing software as being like a bridge. What do you think the analog for software is?
Arthur Whitney: Poetry.

—ACM Queue, A Conversation with Arthur Whitney

Metaphor is all over computing. We manipulate "files" in "directories," we "filter" result sets, programs "communicate" with each other, and so on. We humans are terrible at conceptualizing our programs at the level of voltages or even ones-and-zeros, but we're pretty good at conceptualizing them at the level of metaphor. At this "high level," though, we're vulnerable to misunderstanding. In order to prevent misunderstanding and in order to make sure that what we wrote the program to do and what we meant for it to do are the same thing, we have to to test the program. Testing, though, is boring if you have to do it repetitively by hand, so we figured out how to automate our testing.

Automated Testing and TDD

Everyone tells you it's good to test your code. That's uncontroversial. What still is controversial is the idea of writing your tests before your code or Test-Driven Development ("TDD"). In TDD, you write your tests before you write your code. Where this is "design" is widely misunderstood or at least misinterpreted. This is because TDD is often explained merely by describing its innermost loop: "red-green-refactor," or, "write a test that fails, make that test pass, refactor the code. This is the Le Répertoire de la Cuisine way of explaining TDD.

Parisienne.—Mélanger aux œufs dés de truffes et de pommes de terre sautées au beurre.

Mix with eggs diced truffles and potatoes sauteed in butter.

—Le Répertoire de la Cuisine, "recipe" for a Parisian omlette

To understand this recipe properly, you already have to be a chef trained to make this dish. This is not a "recipe" per se so much as it is a "reference to a recipe" (ideally residing in your head). In order for red-green-refactor to be a good reference for how TDD works, you have to already have been trained in what TDD is and how it works.

In Kent Beck's 2003 book Test-Driven Development, TDD is demonstrated via the example of a multi-currency "Money" object. Kent begins the book by writing down a list of "requirements" or "tests" in English before writing any code. The initial list, for example, is as follows:

$5 + 10 CF = $10 if rate is 2:1

$5 * 2 = $10

—Kent Beck, Test-Driven Development By Example, page 3

As he starts writing tests, he realizes more things that he needs to do and adds them to his list. As he implements them, he crosses them off. Building up an explicit list allows you to clarify what you need to build before you build it. Keeping this list updated as you go lets you know where to go next. The todo list serves as a map to your "working software."

Another thing often missing from casual descriptions of TDD is to stress the importance of using the tests to design the "interface" of your code. Because you're writing a test before any of the code actually exists, you are free to design that code however you please. You can test out different ways of doing things very easily because none of the code you're writing in the test has to actually work yet. This, combined with using the tests to actually verify that your code functions as you think it does, is how your tests can "drive" your design.

Documentation-First Development

Once you've finished writing your code with TDD you have a set of tests that represent a kind of "executable specification." The tests can show you what your code is meant to do—if you wrote them with this documentary usage in mind. Now, though, you've got no documentation for how to use your software. Unless you've just written the most intuitive software ever created, you're going to have to write some user-facing documentation. Herein we have a problem: figuring out how to describe how to use the software from the perspective of a user, rather than a programmer. At this point, you're at risk either of writing bad documentation or, worse, no documentation and having your software be incomprehensible to anyone who is not capable of reading your code. If you're writing open-source software that's still most people and if you're writing proprietary software it's everyone who doesn't work with you.

The issue of understandability of software is enormous and has spawned numerous methods to deal with it. You can write using standard UI conventions like Apple or Google. If you're writing a command-line tool, rather than a graphical one, you could follow some Command Line Interface Guidelines. These are good ideas and help you to follow the principle of least surprise which should help your users to understand your software by drawing analogies to software that they already know how to use. These methods,though, are relatively "low-level" in that they'll help you to design components, but are less helpful when you need to think holistically. For that, you may want to write your documentation first.

The suggestion that you write your user-facing documentation before writing your software is not new. It was suggested at least once before in the form of Readme-Driven Development. What this suggestion gets at, though, is that you can help to design your software by writing the documentation, be that a README, a Quick Reference, or even the design of the packaging of the box for the physical item running your software, before creating that software. This allows you to decide what you want to build for your users and stick to that. It helps you to make your software comprehensible and it can motivate you as you imagine people actually reading your document, getting excited, and deciding to buy or use your code. It can also help you avoid feature creep if you decide on your basic functionality relatively early in the process. Like TDD, writing your user-facing documentation first aids your design.

When writing your documentation first, it's important to keep in mind when to stop writing. Readme-driven development advocates writing just a single-file README. A press-release or a box design are similarly short, self-contained, and relatively easy to change. The documentation that you write first should constrain the possibility space of your work, but it shouldn't tie your hands behind your back.

Documentation-Driven Development

The common theme between Test-Driven Development and Readme-Driven Development is this: develop documentation-first. Start by documenting how your users will interface with your software. This allows you to design by writing words and drawing diagrams. These are infinitely more flexible than code when you're first starting out. Once you've written your user documentation, you've got a basic idea of what you're going to build, what it's going to do, and, if you're lucky, a rough idea of how it's going to actually work. Now you can start writing your automated tests. These tests document the interface of your code, at the level of other code in the same way that your user-facing documentation documents your code at the level of the user.

Code, at its core, is writing. It is writing in the same way that mathematics is writing. It is a means of communicating to other people a rigorous method of achieving a goal. Code has the side-benefit of being easily able to be translated into a language that a computer can understand, meaning that this rigorous definition of a problem is executable. In programming, the blueprint is the building. We programmers are in a unique position where our writing becomes real, like the D'ni authors of Myst's "linking books." If we treat our code and our technical writing as two sides of the same communication, we can better build things both for our users and for our fellow developers.

Addressing Objections: "No plan survives contact with the enemy"

There is a resistance in much of the agile literature to "big up-front design." Often I think that the criticism of this resistance misunderstands it. For example, in Bertrand Meyer's book Agile! The Good, the Hype and the Ugly, he says "the agile school rejects the idea of upfront requirements" (page 32) and then quotes from Kent Beck's Extreme Programming Explained (2nd Edition): ". . . requirements gathering isn't a phase that produces a static document; but an activity producing detail, just before it is needed, throughout development". He then goes on to criticize the agile community for rejecting requirements documents altogether. This is a valid criticism of some people and groups claiming to "do Agile™" but it is not a valid criticism of what was actually advocated. Beck and other advocate against static requirement documents, not the documentation of requirements in general.

Meyer goes on to say that ". . . the charge against requirements, however, is largely hitting at a strawman. No serious software engineering text advocates freezing requirements at the beginning." (Meyer, 2014, page 35). This, however, is not the problem. The problem isn't that there is literature seriously advocating up-front design for software engineers. We, as a community, have known for a very long time that freezing requirements/design does not work. The problem is not "the literature," it is "the reality of software engineering in the context of work." The problem is not the engineering literature, it is managers and others who have no interest in reading the literature. Meyers and Beck are actually advocating for the same thing, just in different words.

In Meyer's defense, I have seen a number of people actually advocate a lack of documentation. You will occasionally see people invoke the Helmuth von Moltke quote: "No battle plan survives contact with the enemy," when suggesting that documentation is not useful. Interestingly enough, Meyer has a good rejoinder to this exact criticism.

The requirements document is just one of the artifacts of software development, along with code modules and regression tests . . ., documentation, architecture descriptions development plans, test plans and schedules. In other words, requirements are software. Like other components of the software, requirements should be treated as an asset; and like all of them, they can change . . .

— Bertrand Meyer, Agile! The Good, the Hype, and the Ugly (page 35)

I would modify the above statement to say: the documentation is software as well. In the OpenBSD project, for example, [n]othing is allowed into the . . . system that is user-visible unless it is accompanied by documentation. Issues (bugs) in documentation can lead to people using software incorrectly or possibly dangerously, no less so than bugs in the code can.

Meyer goes on to mention the exact Helmuth von Moltke quote I mentioned earlier and says: "Military strategists like to quote Marshal Helmuth von Moltke . . . and then they make plans!" (ibid. page 35). The fact that plans will change does not negate their worth. The potential for road closures and detours does not render a map or a GPS useless. In order to get where you're going, you especially want a map when there is a detour because you need to be able to navigate yourself back on track lest you become hopelessly lost.

Conclusions

We should not treat documentation as a "necessary evil" or some secondary concern. Software is documentation and documentation is software. We should start by figuring out where we mean to go and that necessitates some form of documentation-driven-development. We need to make sure to refactor that documentation and keep it up to date, keep it a "living" document the same way we do with our software. We must value working documentation as much as we value working software.