The fact that what computers operate on (data) and the instructions for how to operate on it (software) are the same kind of stuff (i.e., data) is obvious, simple, profound, not well-understood, and has huge implications. The relatively small number of programmers who take advantage of the fact that software is data are levels and levels better and beyond "normal" programmers. The fact that software is data is well qualified to be a fundamental concept of computing, along with counting, closed loop and a few other things.
Software is Data
All software is data; some data is software.
Everyone who sorta knows what programming is knows this, the same way that everyone knows that air is lighter than water. But who thinks about it? Who cares? What difference does it make?
First let me spell out how software is data. You've got a bunch of files on your computer. Some of them may be text files or spreadsheets (data). Some of them may be the kind of text that will make sense when opened with a programmer's editor (still data); you edit it and save it back (data all the way). Then you run the compiler. It takes as input the text representation of the program (data) and writes out a new file, which is an "executable" file, still data. Then you "run" the program, i.e., a program loads the executable file data into memory and then, by one means or another, gets the machine's instruction address pointer to point to the first byte of the file in memory. At which point, the program is "executed."
The file was data when it was on disk, regardless of its format. It was data when it was loaded into memory, not much different than a text file loaded by a text editor. It was data when the processor started executing instructions, and it was data once the program ceased being executed. It was data before, during and after execution. It just happened to be (hopefully) data in the format of sensible instructions the machine knows how to execute. Even if it's crap, the machine will do its best to execute it until it somehow loops or crashes out. Then you get your machine back. The point is: the machine doesn't know the difference!
To a computer, everything is data. If we set the instruction address pointer to data that happens to be nicely formatted instructions, good things will happen. But it's still just data.
Software wasn't always data!
It's an amazing, unprecedented leap forward to make the control function of a machine out of the same stuff, stored in the same way and processed in the same way, as the stuff the machine works on. No other machine is like this! Because of this unique facet of computers, most people tend to act the way they usually do, just as they do for the unprecedented speed of evolution of computers.
Any machine you can think of has the control part and the "business" part, where the machine does what it does, according to the control. This is true for a lawn tractor,
and it's true for a vehicle. It's true for the calculating machines of the late 1940's such as the IBM 402 (here's the plug board where the "program" was entered).
It's true for that famous early computer, the ENIAC; here's the plug board, sadly not replaceable,
with a couple of the ladies who programmed it. It's even true for the much-lauded Turing machine, whose famous endless tape contains only data, while the control is somewhere else.
It was a true great leap forward, breaking with the strict separation of control and action that exists everywhere in human experience, to use some of a computer's data for control purposes, and the rest for data that isn't software. This is called a von Neumann architecture, or a "stored-program" computer.
Because software is data, a program can act on itself, since "itself" is data; i.e., a program can modify itself. This characteristic is unique among machines -- only stored-program computers can do this. Sound familiar?
Interpreters and other levels
The first software was the binary data that the machine recognized as instructions. Next step was a more readable, text version of the machine language: assembler language. They key thing with assembler language is that each line of assembler language translates to exactly one line of machine language. Next comes compiled languages like FORTRAN and COBOL, which compilers turns into machine language, typically with multiple machine instructions for each line of language. Next comes interpreted languages, which are "executed" by an interpreter program; instead of generating instructions, the interpreter just does what's intended by the program right away.
From this we gather that programs can take as input programs (in some format and language), and either execute them or generate other programs as output, either literally executed machine instructions or some other form of language.
Consequences of software being data
This is a BIG subject. For starters, though, doesn't it make sense that when a machine is "self-operating," as computers are, and no other machine in our experience is, that effectively utilizing the self-referential power of the machine would lead to interesting things? It certainly has for humans!
Let's take the classic issue of customization. Once a programming environment has been chosen, the tendency is to model your problem in terms of the programming environment, and code away. For example, there's the whole field of object-oriented analysis and design, in which you're supposed to use this style of thinking to put everything in terms of, then you proceed to build your classes and away you go. Life is great. But now something has to get changed. This requires that you examine the entire body of code, make the changes, test, migrate data as needed, etc. And again and again.
Eventually, you might realize that certain classes of changes are often required. If you really get that software is data, you will realize that you could have modeled the entire application in the simplest possible terms and built an interpreter. Changes then fall into one of two categories: the new thing is a variant on the kinds of things the interpreter already does, in which case you just change the model; or the new thing is a new kind of thing, in which case you extend the interpreter and use its new capability in the extended model.
This is just one example. You can have mixed models, you can generate code, you can mix in classic parameters, etc.
The point is that, if you are a software-is-data-aware person, you aren't "stuck" in any programming model or environment. Moreover, you are more likely to come up with effective and efficient approaches, which can easily be orders of magnitude better than naive, single-level ones.
Conclusion
"Software is data" should be one of the most obvious statements you can make in the computer world. It's like going to the New York Yankees and declaring that one of the most important things about baseball is that bats are involved. Everyone would avert their eyes and you would be quietly shown the door. Yet I find that the norm in software groups is to demonstrate no awareness through their actions that software is data. As far as you can tell by their actions, software is a whole separate thing. To them, it's as though software were like every other kind of machine control panel everyone encounters in their normal lives. They don't seem to even consider programming paths that involve software that modifies or interprets software. As a result, they are vulnerable to massive embarassment and humiliating defeat by groups that take advantage of this fundamental concept of computing.
Comments