There have been two gigantic advances in programming languages, each contributing huge advances in programmer productivity. Since then, thousands of times more attention has been given to essentially trivial changes than has been given to the two giant advances. I described these 50 years of non-advances here. In this post I’ll go back in history to describe the two giant advances that were made in the early days of computing, back in the 50’s, and I’ll tack on the one important corrective that’s been made since then.
First, the pre-history of computer languages
The famous early computer was the ENIAC.
https://en.wikipedia.org/wiki/ENIAC
It cost the equivalent about of $7 million to build and took about 3 years. It was big, occupying about 300 sq ft. It was roughly 1,000 times faster than electro-mechanical machines, so you can see why people were excited about it. Its purpose was to perform complex calculations on scientific/engineering data.
While the excitement was justified, the method of programming was gruesome.
“ENIAC was just a large collection of arithmetic machines, which originally had programs set up into the machine[33] by a combination of plugboard wiring and three portable function tables (containing 1,200 ten-way switches each).[34] The task of taking a problem and mapping it onto the machine was complex, and usually took weeks. Due to the complexity of mapping programs onto the machine, programs were only changed after huge numbers of tests of the current program.[35] After the program was figured out on paper, the process of getting the program into ENIAC by manipulating its switches and cables could take days. This was followed by a period of verification and debugging, aided by the ability to execute the program step by step”
Engineers (all men) would figure out what they needed the machine to calculate. It then fell to the “programmers,” all women, to perform the excruciatingly detailed work with the knobs and switches.
From this it is extremely clear that while the machine was an astounding achievement in speed, the speed of programming the calculation steps was the huge bottleneck. How can we make this faster?
Answer: instead of physical plugs and switches, let's invent a language for the machine -- a "machine language!" That's what happened with the invention of the stored program computer, which took place in 1948. Instead of moving plugs and flipping switches, programming was done by writing it out in machine language on paper, loading it into the computer via punched paper cards, and then having the computer execute it. Figuring out the detailed steps was still hugely difficult, but at least getting it into the computer was faster.
From machine language to assembler
Every machine has one and only one “machine language.” This is a stream of binary data that the machine’s processor “understands” as instructions. Each instruction causes the machine to perform a single action on a single piece of data.
I learned to read and write machine language for a couple different machines. It’s not easy. From early on, machine language was usually shown as hexadecimal instead of raw binary, so that instead of looking at “0100101000101111” you see the hexadecimal equivalent: 4C2F. A big plus but you’re looking at a hexadecimal pile of data. A nice editor will at least break it up into lines of code, but you still have to read the hex for the instruction, registers and storage locations
Moving from machine language to assembler language was HUGE. Night and day in terms of readability and programmer productivity. Assembler language is a body of text that closely corresponds to the machine language. Each line of text typically corresponds to exactly one machine language instruction. Assembler language transformed programming. If you understand any machine language, you can easily write and read the corresponding assembly language. For example, here's the binary of a simple instruction:
Here's the equivalent in hexadecimal:
And here it is in assembler language, including a comment:
Just as important as making instructions readable was using text labels for locations of data and addresses of other instructions. In machine language, an address is expressed as a “real” address, for example as a hexadecimal number. If you inserted a command after a jump command and before its destination, you would have to change the address in the jump. You can see that with lots of jumps this would quickly become a nightmare. By labeling program lines and letting the assembler generate the “real” addresses, the problem disappears.
From assembler to high-level languages
Assembler made writing code in the language of the machine incredibly easier. It was a huge advance. But people quickly noticed two big problems. The solution to both problems was the second giant advance in software programming, high-level languages.
The first problem was that writing in assembler language, while worlds easier than machine language, still required lots of busy-work. For example, adding B and C together, dividing by 2 and storing the result in A should be pretty easy, right? In nearly any high-level language it looks something like this:
A = (B+C)/2
Putting aside messy details like data type, this would be the following in pseudo-assembler language:
Load B into Register 1
Load C into Register 2
Add Register 2 to Register 1
Load 2 into Register 2
Divide Register 1 by Register 2
Store Register 1 into A
Yes, you can quickly become proficient in reading and writing assembler, but wouldn’t the 1 line version be easier to write and understand than the 6 line version?
Expressions were the core advance of high level languages. Expressions turned multi-line statement blocks that were tedious and error-prone into simple, common-sense things. Even better, the use of expressions didn’t just help calculations – it helped many things. While assembler language has the equivalent of conditional branches, many such conditionals also involve expressions, for example, the following easy-to-read IF statement with an expression
If ((B+C)/2 > D) THEN … ELSE …
Would turn into many lines of assembler – tedious to write, taking effort to read.
The second problem emerged as more kinds of computers were produced, each with its own machine and assembler language. A guy might spend months writing something in assembler and then want to share with a friend at another place using a different type of computer. But the friend’s computer was different – it had a different assembler/machine language! The whole program would have to be re-written. Total bummer!
Wouldn’t it be nice to write a program in some language that was like the 1 line version above, and that could be translated to run on any machine??
Easier to write by a good 5X, easy to read and can run on any machine, present or future. Hmmmm… There must be something wrong, this sounds too good to be true.
The concept of high level languages was good and wasn’t too good to be true. Everyone agreed, and it wasn’t long before FORTRAN and similar compute-oriented languages arose. Even performance-obsessed data nerds wrote programs in FORTRAN rather than assembler because FORTRAN compiled into machine language was just as fast, and sometimes even faster because of optimizations those clever compiler guys started inventing! And the programs could run anywhere that had a FORTRAN compiler!
The accounting and business people looked at what was happening over in heavy-compute land and grew more and more jealous. They tried FORTRAN but it just wasn’t suitable for handling records and transactions, not to mention having nice support for financial data. So business processing languages got invented, and COBOL emerged as the best of the bunch.
If you look at a compute statement in FORTRAN and COBOL, they’re nearly identical. But in the early days, FORTRAN had no support for the money data type and no good way to representing blocks of financial data that are core to anything involving accounting.
Once they got going, people just kept on inventing new languages for all sorts of reasons. The obvious deficiencies of FORTRAN and COBOL were fixed. Languages could handle intense calculations and business data processing about as well. But back around 1970, 50 years ago, there remained a gap. That gap was that an important category of programs could not be written in any of the high-level languages. Essential programs like operating systems, device drivers, compilers and other “systems” software continued to be written in the assembler language of the machine for which it was intended. There were attempts to fill this gap, but they failed. Then a couple super-nerds at Bell Labs, building on an earlier good shot at solving the problem, invented the language C.
They rewrote the UNIX kernel in C, and the rest is history – C became the high-level language of choice for writing systems software and still holds that place.
Conclusion
Two giant advances were made in programming languages. Each of these happened early in computing history, in the 1950’s. The first giant advance was the biggest: with assembler language, machine language was reasonable to write, read and change. The second giant advance was the category of high level languages. Two minor variants on the concept, COBOL and FORTRAN, established the value of having a language that enabled expressions and could be compiled to run efficiently on any machine. In spite of the on-going tumult of language invention that has taken place since, the fact that huge bodies of working code written in those two languages continue to perform important jobs makes it clear that they embodied the giant advance that was high level languages. The only substantial omission in those languages – the ability to directly reference computer memory – was filled by the invention of the language C.
Most of what’s happened since in programming languages is little but sound and fury. For a review of those non-advances see this.
I do appreciate some of the subtle changes that have been made in language design. Some modern languages really are better – but mostly not because of the details of the language! I’ll treat this issue in a future post. Meanwhile, it’s important to understand history and appreciate the giant steps forward that high level languages as a category represent.