Everyone who’s even vaguely in touch with the headlines knows there was a problem getting results from the Iowa caucuses – a problem blamed on a software app built to automate reporting voting results. No one, including the maker of the app, disputes there was a serious software problem.
Most of the commentary has focused on steps that should have been taken to assure that the app worked better than it did. The Wall Street Journal quoted various “experts” who stated – authoritatively, no doubt – that “testing could have prevented” the problems.
What’s not being said is the emperor-has-no-clothes “secret” – our society is irrationally obsessed with software, and acts convinced that getting new software to perform some function that is now getting done with older software or without software will make things better, be worth the trouble and have no downside. Otherwise sensible and experienced people continue to ignore the ongoing, rolling disaster of software building and deployment, and blissfully welcome new software nightmares, as though there’s no chance that anything could go wrong. The Iowa Democrat caucus face-plant is just the latest of a decades-long parade of software disasters. To make it explicit: I'm using this highly visible event as an occasion to illustrate a general, widespread problem; it has nothing to do with Democrats or politics.
The 2016 Caucus Software
Both Democrats and Republicans hold Caucuses in Iowa. Each party sets its own rules and changes those rules as it sees fit. They even do the voting a bit differently. Nonetheless, the parties cooperated to have a single piece of software that would service all their needs. They jointly announced this in the Jan 31, 2016 Des Moines Register:
They also had phone backup in case of software problems. The net result is that the Democrat Party chairman announced final results:
The software wasn’t perfect, of course – software never is. But it got the job done:
The 2020 Caucus Software
In the summer of 2019, the Iowa Democrats decided that, instead of upgrading the software that worked for them last time, they would pay for brand-new software to be built by a tiny, brand-new firm run by a non-programming History major who’d had a role in the tech of the 2016 campaign for Hillary Clinton. New firm. New software. Has to work out of the box, with over 1,700 people using it for the first time. No problem, right? That’s what the chair of the DNC proclaimed just hours before the caucuses opened.
Many people at the caucuses had trouble using the software, and in the end, it just didn’t work – it wasn’t able to load the results it captured from the people who were finally able to use it into the DNC reporting database. An analysis shows the kind of amateur-hour flaws you'd expect from the kind of group that built it.
Of course they forgot or ignored, like most people do, the train wreck of so many high-visibility new software unveilings, illustrated well by the ACA software releases of 2013 such as the one in Oregon I discuss here.
The Curse of Technology
Technology can be a wonderful thing. I’m grateful for the good things it delivers. But even relatively simple, physical technology can have massive problems. Did you know that 3,613,732 people have died in motor vehicle accidents in the US alone between 1899 and 2013? In 2016, 37,461 people died, as the carnage continues unabated. Just to put that number in context, the total number of US military deaths in all wars starting with the Revolution is well under half that number.
Does this surprise you? Here’s an example: during the entire period of the Vietnam war, the US had a total of 47,424 military deaths. During EACH year in the period 1966 to 1973, there were more US traffic deaths than the total for the war, with a high of 53,543 traffic deaths in 1969. You know which category of numbers people paid attention to.
The problems with computer and software technology massively dwarf those of physical technology, by huge factors.
Physical bank and stage coach robbers make for good visuals and press. One of the most famous was Willie Sutton, who spent decades in jail for having gotten about $2 million from all his hold-ups over many years. Let’s introduce computers and software to make things better! Sure. Banks lose over $10 Billion a year due to computer-driven credit card fraud alone! Banks vs. banking software? No contest.
Loads of people in the industry are promoting the latest software fad, for crypto-currency, like Bitcoin. It’s supposed to be super-secure. Sure. It turns out to be even worse, as the mounting software failures and losses demonstrate.
What all of this means is that once a piece of software does what it’s supposed to do, in production and at scale, no sensible person even thinks about replacing it with a whole new application. The reality is that most of our bedrock software systems took years to build, are often decades-old – and continue to work solidly and reliably.
An excellent example of this is the body of COBOL code produced by the small software company Paysys. When I was CTO of the company, in the late 1990’s, it ran over 150 million credit cards, including for Citibank, GE Capital and other name brands. The company was bought by the world’s largest credit card processor, First Data, and an incrementally evolved version of the same code now runs over 600 million cards, more than any other piece of software. The code is nearly 30 years old! It’s written in “obsolete” COBOL! There have multiple major attempts to re-create it using “modern” technology, sometimes with efforts taking years and costing tens of millions of dollars. Failures, every one.
The reality is that software is hard to build and get right, even today, after decades of time to figure it out. If we were as good at building bridges as we are at software, no one would dare drive over one. The big tech companies, for all their glittering reputations, are lousy software builders. When they try to build something new, they usually take years and end up failing – which is one of the reasons they acquire so many software companies, companies that in many cases have somehow managed to build software that they couldn’t. Here are details about Facebook’s ineptitude. If it's so hard for tech giants, chock full of super-bright nerds, why should anyone expect brand-new software written by a tiny group of amateurs to perform well the first time, with no testing or training, and in a national spotlight?
Building Software is different than building other technologies
People make decisions about software that, if it were any other technology, they would never make. In fact, they would consider those decisions to be beyond stupid.
Suppose you needed to get 1,700 people to an important public event from scattered places. While they didn’t all have to arrive at exactly the same, they had to be there within a couple hour window. Without fail. What would you do? You might consider asking a couple of established limo companies with fleets of reliable cars and experienced drivers to get it done. If a couple people insisted on driving themselves, you’d make sure their cars were OK and they had a backup plan. You might arrange for some car-pooling, or bring some people to a central spot and arrange for an experienced bus service with professional drivers. You might do any number of things.
Let’s further suppose you’d done something like this once, but when you did it again, you wanted to do some important things differently. Suppose you needed to make two trips on the same day. You would probably make some changes to what you did last time, while adjusting for whatever didn’t work perfectly before, right?
There are lots of things you wouldn’t even think of doing. Would you do any of the following? Design, build and distribute brand-new vehicles for doing the transporting? Contract with a tiny, new company with no real track record to design the vehicles from scratch, not even modifying an existing model that works? Distribute the vehicles with no real test-rides? Give them to each of the 1,700 people to drive themselves? Given that the vehicles are new, not built according to standards, expect the new drivers to drive themselves with no training and no help? When the people have trouble driving, have no help available? When the vehicles break down, have no back-up plan to get the job done? When the disaster drags on for days, still be unable to fix it?
No, I don’t think any person would consider making these kinds of decisions with physical-world technology. The very thought would be ludicrous. But this is exactly what happened with the 2020 Iowa Caucus software! Endorsed and supported from the top of the DNC, and not questioned by any of the Iowa leaders!
Conclusion
There is a problem with how we think about and build software. A big problem. It’s nearly universal, and shared by normal people and supposed software experts. What happened at the 2020 Iowa Caucus technology is exceptional only in that it was so highly visible, which is why I use it as an example. The insane decision-making process that took place there takes place every day, from the way normal people and organizations work with software to the way tech giants and software so-called experts work.
The small groups of people who build software effectively and well are largely ignored by experienced managers and organizations. But these are the ones who are most likely to create amazing new bodies of software that change industries, before their software is absorbed and subsumed into the larger organizations who try to avoid breaking it too badly.
The mainstream thinking about software, both by managers and by software experts, is badly flawed. Things won't get better until fundamental things are changed.