I recently posted what I thought would be my least-read post ever. It's about the subject that has by far the greatest spread by far between "something tech people say is important" and "something tech people avoid thinking about or doing."
There are many reasons why this is the case. High on the list is the fact that documentation occupies "below zero" status on the list of things programmers and their managers actually care about. I'm also guilty of this. Look at this detailed post I wrote about the hierarchy of status in programming and you'll see that documentation is nowhere mentioned!
Here's a metaphor to help you understand the importance of documentation. Suppose an important election is on the horizon and for various reasons an unprecedented number of people are likely to vote by mail instead on in person. Suppose that there's a large region in which USPS mailboxes have been installed in scattered places. Suppose that a security failure has been discovered in the design of the mailboxes that makes it quick and easy for a trouble-maker to open it and remove all the mail, including of course any ballots. Word of this vulnerability seems likely to leak out, tempting activists to raid mailboxes where ballots cast for the party they want to lose are likely to be found. It's a bug! It has to be fixed immediately!
In the real world, there are probably lists of the locations of all such public mailboxes. Even if there aren't, there are USPS employees who visit the boxes regularly and know where they are. Failing everything else, the public can be asked to register the location of any boxes they know about. No problem.
In the wonderful world of software, things are entirely different. The mailboxes and everything around them are invisible to normal people. Members of the public don't "go there." USPS employees don't go there. The original contractor who installed each box at various times may have been required to provide documentation of his work, but the original documentation was incomplete and full of errors, and was never updated for subsequent additions and changes. What's worse, the boxes are darned hard to find. Practically no one has the right kind of eyes and training to even be able to correctly recognize a box when driving slowly down a road looking for them.
If you're a modern, with-it programmer you might be thinking to yourself at this point "hah! That wouldn't happen with my code -- I use modern micro-services, so I'd just have to go to the right service." Uh huh. What if the bug had to do with some data that was defined incorrectly? Do you have a DBMS? Do you have a UI? Does every piece of data really appear and get used exactly once in exactly once place? Is it really so easy to not only find each instance of the data but all the downstream uses and consequences of the data that are impacted by the error?
Regardless of how your code is organized, finding and fixing a bug can take a depressing amount of time, and part of the time is often the result of not having comprehensive, accurate low-level documentation.
In the relatively simple and visible-to-everyone real world of paper ballots and mailboxes, we know there are errors and faults, some of them extensive. Bad things can and do still happen. Because of the simplicity and visibility of the normal world the faults are often noticed quickly, like when bad ballots are sent. In the incredibly complex and visible-to-few world of software, the faults can go for long periods without even being noticed, and when they finally surface it can take the tiny number of super-specialists with the right training, vision and persistence a long time to find and fix the bugs lurking in the vast spaces of the largely invisible, undocumented oceans of software we all depend on.
Documentation is unlikely to improve in the world of software because practically no one, despite the sounds that may come from their mouths, really cares. The good news is that better approaches to building software than are fashionable today go a long way to minimizing the trouble. The more we move towards Occamality, the more we'll get there. The reason is simple: if everything in a program is in exactly one place instead of scattered redundantly all over the place, at least you'll know that there's just one fierce monster bug out there and not an army of clones.
Comments