The invention and widespread acceptance of the modern database management system (DBMS) has had a dramatic impact on the evolution and use of programming languages. It's part of the landscape today. People just accept it and no one seems to talk about the years of disruption, huge costs and dramatic changes it has caused.
The DBMS Blasts on to the Scene
In the 1980’s the modern relational database management system, DBMS, blasted onto the scene. Started by an IBM research scientist, E.F. Codd and popularized by his collaborator Chris Date, The Structured Query Language System, SQL, changed the landscape of programming. Completely apart from normal procedural programming, you had a system in which data could be defined using a Data Definition Language, DDL, and then created, updated and read using SQL The data definitions were stored in a nice, clean format called a schema. Best of all, the new system gave hope to all the people who wanted access to data who couldn’t get through the log jam of getting custom report programs written in a language the analysts didn’t want to be bothered to learn. SQL hid most of the ugly details because of its declarative approach.
SQL was more than just giving access to data. There was a command to insert data into the DBMS, INSERT, then make changes to data, UPDATE, and even to send data to the great bit-bucket in the sky, DELETE. The system even came with transaction processing controls, so that you could perform a deduction from one user's account and an addition to another user's account and assure that either they both happened or neither did. Best of all the system did comprehensive logging, making a permanent record of who did what changes to which data and when. Complete and self-contained!
This impressive functionality led to a problem. With users demonstrating outside the offices of the programming department, things were getting rowdy. The chants would go something like this:
Leader: What do we want?
Shouting crowd: OUR DATA!
Leader: When do we want it?
Shouting crowd: NOW!
Everyone wanted a DBMS. They wanted access to their data without having to go through the agony of coming on bended knee to the programming department to get reports written at some point in the distant future.
The response of languages: what could have happened
It wouldn't have been difficult for languages to give the DBMS demonstrators what they wanted with little disruption. One possibility was changing a language so that it produced a stream of data changes for the DBMS in addition to its existing data changes. A second possibility was changing a language so that its existing data statements would be applied directly to a DBMS. Either of these alternatives would have supplied a non-disruptive entry of DBMS technology into the computing world. But that's not what happened.
I personally implemented one of these non-disruptive approaches to DBMS integration in the mid-1980's and it worked. Here's the story:
I was hired by EnMasse Computer, one of several upstart companies trying to build computers based on the powerful new class of microprocessors that were then emerging. EnMasse focused on the commercial market which was at the time dominated by minicomputers made by companies like DEC, Data General and Prime. Having a DBMS was considered essential by new buyers in this market, but most existing applications were written in languages without DBMS support. One of my jobs was to figure out how to address the need. I was told to focus on COBOL.
This was a big problem because the way data was represented in COBOL didn't map well into relational database structures. What I did was get a copy of the source code of our chosen DBMS, Informix, and modify it so it could directly accept COBOL data structures and data types. I then went into the runtime system and modified it to send native COBOL read, write, update and delete commands directly to the data store, bypassing all the heavy-weight DBMS overhead. This was tested and proven with existing COBOL programs. It worked and the COBOL programs ran with undiminished speed. The net result was that unmodified COBOL used a DBMS for all its data, enabling business users full access to all the data without programming.
I did all the work to make this happen personally, with some help from an assistant.
I thought this was an obvious solution to the problem that everyone would take. It turns out that EnMasse failed as a business and that no one else took the simple approach that was best for everyone.
The response of languages: what did happen
What actually happened in real life was a huge investment was made with widespread disruption. Instead of burying the conversion, massive efforts were taken to modify -- practically re-write -- programs written in COBOL and other languages so that instead of using their native I/O commands they used SQL commands instead, with the added trouble of mapping and converting all the incompatible data structures. More effort went into modifying a single program for this purpose than I put into making the changes at the systems level to make the issue go away. What's worse is that, because of the massive overhead imposed by DBMS's for data manipulation using their commands instead of native methods performance was degraded by large factors.
While the ever-increasing speed of computers mitigated the impact of the performance penalty, in many cases it was still too much. In the late 1990's after massive increases in computer power and speed, program creators were using the stored procedure languages supplied by database vendors to enable business logic to run entirely inside the DBMS instead of bouncing back and forth between the DBMS and the application. While this addressed the performance issue of using DBMS for storage, it introduced having the logic of a business application written in two entirely different languages running on two different machines, usually with different ways of representing the data. Nightmare.
The rise of OLAP and ETL
One of the many ironies of the developments I've described is that people eventually noticed that the way data was organized for sensible computer program use was VERY different from the best ways to organize it for reporting and analysis. The terms that emerged were OLTP, On-line Transaction Processing, and OLAP, On-Line Analytical Processing. In OLTP, it's best to have data organized in what's called normalized form, in which each piece of data is stored exactly once in one place. This makes it so that a program doesn't have to do lots of work when, for example, a person changes their phone number; the program just goes to the one and only place phone number is stored and makes the change. OLAP is a different story because there's no need to update data that's already been created -- just add new data.
There were also practical details, like the fact that data was stored and manipulated by multiple programs, many of which had overlapping data contents -- for example a bank that has a program to handle checking accounts and a separate program for CD's, even though a single customer could have both. This led to the rise of a special use of DBMS technology called a Data Warehouse, which was supposed to hold a copy of all a system's data. A technology called ETL, Extract Transform and Load, emerged to grab the data from wherever it was first created, convert it as needed and stored it into a centralized place for analysis and reporting.
Given the fact that you really don't want people sending SQL statements to active transaction systems that could easily drag down performance and all the factors above, it turns out that the push to make normal programs run on top of DBMS systems was a monstrous waste of time. One that continues to this day!
Conclusion
Nearly all programmers today assume that production programs should be written using a DBMS. While alternatives like noSQL and key-value stores have emerged, they don't have widespread use. Since the data structures used by programs are often very different than those used by the DBMS, a variety of work-arounds have been devised such as ORM's (Object-Relational Mappers), each of which has a variety of performance and labor-intensive issues. The invention and near-universal use of relational DBMS in software programming is a rarely recognized disaster with ongoing consequences.
Comments