Since when is “data entry” (entering data into a computer) a pivotal, innovative technology? When the difference between doing it the normal way and doing it with advanced technologies is a …ten-to-one productivity difference … that’s when.
I’ve described how the Operations Research algorithm of Linear Programming is fifty years into an agonizingly slow roll-out through different applications, from scheduling oil refineries in the 1960’s to scheduling retail sales in the 1990’s, and now scheduling medical infusion centers and operating rooms in the late 2010’s. In each case, laborious and error-prone human scheduling was replaced by the algorithm, with improvements ranging from no less than 10% to over 50%. This is major! Why did such an innovation wait for decades to be applied, and for many applications, is still waiting!!?? This is the mystery of how what’s called “innovation” works in reality, and why it should be called “in-old-vation” instead.
You may think that part of the cause is that LP is an exotic algorithm – even though it’s a standard part of the Operations Research engineering curriculum, most so-called normal people haven’t heard about it. While it appears that even the hosts of people who wax eloquent about AI and ML are clueless about LP, it’s not exactly a secret. So let’s see if obscurity is the reason why LP remains a “future innovation” in many potential applications, by examining the super-plain, ordinary, completely-understandable-by-normal-people case of data entry.
Data Entry
Just as process optimization is done by hand or stupid methods for decades until some genius comes up with the brilliant idea of applying tried-and-tested LP to the problem and dramatically improves it, so is Data Entry widely performed by primitive methods until some “innovator” comes along and applies “Heads-Down Data Entry” (HDDE) methods to the process – and typically gets improvements of 3 to 10X! The only difference is that while LP is taught in Engineering departments and studied by math nerds, Heads-Down-Data-Entry is “just” a collection of common-sense techniques that require no math and no professors to understand or implement. It’s so “common” that it doesn’t even have a generally accepted name – though it’s been implemented in many places and been thoroughly proven in practice. It’s far too humble to merit an academic department – and yet, when applied, has delivered truly massive gains, far higher proportionately than exotic Linear Programming has!
The methods of HDDE were first implemented in places that had huge volumes of repetitive data to be entered into computers. Banks were early users for check entry, and so were the credit card companies who, at the start, had huge volumes of paper charge slips to process. Simple ideas like minimizing keystrokes and eye movement were implemented, and then taking advantage of the eye-to-fingers pipeline, when people noticed that showing clerks the next item to be entered before the current one was complete led to a big jump in speed. Other methods like double-blind techniques were invented, so that entry clerks just entered – whether their work was original or used to check someone else’s work was entirely handled by the Data Entry system.
As soon as scanning and image display became practical, HDDE adopted them. That led to another jump in productivity, enabling large, complex forms to be broken up into pieces, so that a clerk would see the image on a screen of the same piece of data from a whole set of forms instead of entering a whole form from start to finish. No HDDE shop would even consider having the entry clerk think about anything on the form, stuff like “if this field is missing, do this instead of that,” because it would just slow them down.
Finally, there’s ICR (Image Character Recognition), which is having the computer “read” the image instead of a human. This technology has existed for many decades. Once you’ve got HDDE in place, phasing in ICR is a natural, so that the proportion of entry done by humans gradually decreases as the effectiveness of ICR increases.
Remember, applying LP to scheduling might result in a 30% improvement, which in most cases, is major to the point of being revolutionary. What about HDDE? Entering data from a paper form into a computer using primitive methods might get between 1,000 and 1500 KPH (keystrokes per hour). There are lots of stages of improvement, including things I’ve mentioned like eliminating the thinking and breaking up the form, but levels of 10,000 to 15,000 KPH in a professional environment are widely achieved – with superior quality. That’s a minimum of 5X! Typically much more. Of course, as you incorporate ICR into the process, it gets even better, gradually reducing the human factor, so that most fields are entered with no human involvement. At this point, the technology is probably best called ICR+HDDE, though there is no generally accepted term.
Given all this, it would be insane to handle computer data entry by anything other than HDDE methods, right? Welcome to the software industry, where insanity of this kind is the accepted state of affairs. And where almost no one practices the most simple and basic of computer fundamentals, such as counting.
How HDDE gets implemented
I described how Linear Programming went from problem domain to domain, each time acting like an innovation, as indeed it was in that area of application. Once it gets established, it tends to stay. The case of HDDE is different, I think because it’s not a recognized “thing” in the halls of academia, or among the poo-bahs of big business. It’s the kind of thing that no self-respecting Professor of Computer Science would stoop to consider, assuming he ever encountered such a low-status thing – you know, the kind of thing that “merely” makes common sense and, well, works.
HDDE has appeared in competitive, high-volume service businesses, where it has a major role to play in delivering results for the customers of the business. There have been software products that directly support HDDE, so that all you have to do is buy and implement them. It’s neither obscure nor hidden. But it’s never been talked about at conferences as the “coming thing.”
Case Study: HDDE rejected
In the early 1990’s, when document imaging and workflow technology were hot and something people talked about the way they talk about AI/ML and innovation today, the government-backed student loan organization, Sallie Mae, decided to apply the technology to improve the operations at the handful of processing centers they had at the time, employing many thousands of people and processing millions of documents a year. The popular thing to do at the time was to scan documents on receipt, and then send them to the same places the paper was sent, so that workers could process the images of the documents displayed on new big screens instead of paper. The job was basically to type the data into the right places of the software application they used.
Everyone at the time said that converting the documents was important ONLY because it enabled wonderful workflow, the elimination of inboxes and outboxes for paper. And the bits of other stuff you could do, like having a group of people taking from a common inbox instead of each having their own. The common “wisdom” was that you could gain 30% productivity improvement by implementing this marvelous new technology.
I got involved, since I was a recognized expert on document imaging technology at the time, and had personally coded one of the early workflow systems. I figured out and showed in detail that by canning the workflow and implementing HDDE techniques, they could gain a minimum of 5X productivity improvement. No one disputed my thoughts or detailed plan. They just ignored it, and proceeded to implement the standard stuff. I strongly suspect that after considerable time and expense, there were walk-throughs of the Sallie Mae sites showing visitors the big screens and absence of paper – what a big success the project was!
Case Study: ICR-HDDE applied with success
There are two current cases I know of where ICR-HDDE is being applied and winning. Each are classic, narrow service businesses where converting forms to data is the key value of the business, and where the companies buying the service just want fast, accurate data delivery, they don’t care how it’s done. Disclosure: each of these is an Oak HC/FT investment.
At Groundspeed, insurance forms and reports are captured and the relevant data is extracted from them by the most effective relevant means, often involving forms of ICR-HDDE. There is lots of forms recognition from documents and images that are often computer output, with the relevant data appearing at varying places on a page. Nonetheless, Groundspeed is able to deliver the data stream the customer needs, quickly and accurately, The results are so powerful that new levels of analytics are enabled by the newly available stream of structured data.
At Ocrolus, financial documents of all kinds including bank statements and pay stubs are converted to data in a standard format to enable fast and effective operations like giving loans for business and personal use, along with a growing list of other operations that also need good data. An effective combination of ICR-HDDE techniques are applied to get results for companies that need accurate data to make fast decisions.
Conclusion
HDDE is a collection of methods that have been proven in practice for many decades. The technology that is behind it continues to deepen and reduce human effort even more, with the addition of ICR. But it remains a niche technology, ignored by the numerous places that could benefit from it, even more than LP.
The big difference between LP and HDDE is that LP is a formal piece of magic that’s in academia. HDDE is nowhere. In fact, it’s really just an example of classic industrial engineering applied to computer software and the people who use it. Which makes it all the more mysterious that it's largely ignored.
In-old-vation is real. Most “innovations” are minor variations on things long-since proven and demonstrated in practice, but are unimplemented in the many situations that would benefit from them until some mysterious combination of circumstances arises to let them explode into practical reality.