The relationship between data and instructions is one of those bedrock concepts in software that is somehow never explicitly stated or discussed. While every computer program has instructions (organized as routines or subroutines) and data, the details of how the data is identified, named and accessed vary among programming languages and software architectures. Those differences of detail have major consequences. That’s why understanding the underlying principles is so important.
Programmers argue passionately about the supposed virtues or defects of various software architectural approaches and languages, but do so largely without reference to the underlying concepts. Only by understanding the basic concepts of data/instruction relationships can you understand the consequences of the differences.
The professional kitchen
One way to understand the relation between instructions (actions) and data is to compare it to something we can all visualize. An appropriate comparison with a program is a professional chef’s kitchen with working cooks. But in this kitchen, the cooks are a bit odd -- only one of them is active at a time. When someone asks them to do something they get active, each performing its own specialty. A cook may ask another cook for help, giving the cook things or directing them to places, and getting the results back. The cooks are like subroutines in that they get called on to do things, often with specific instructions like “medium rare.” The cook processes the “call” by moving around the kitchen to fetch ingredients and tools, brings them to a work area to process the ingredients (data) with the tools, and then delivers the results for further work or to the server who put in the order. The ingredients (data) can be in long-term storage, working storage available to multiple chefs or in a workspace undergoing prep or cooking. In addition, food (data) is passed to chefs and returned by them.
The action starts when a server gives an order to the kitchen for processing.
- This is like calling a program or a subroutine. Subroutines take data as calling parameters, which are like the items from the menu written on the order to the kitchen.
The person who receives the order breaks the work into pieces, giving the pieces to different specialists, each of whom does his work and returns the results. Unlike in a kitchen, only one cook is active at a time. One order might go to the meat chef and another to whoever handles vegetables.
- This is like the first subroutine calling other subroutines, giving each one the specifics of the data it is supposed to process. The meat subroutine would be told the kind and cut of meat, the finish, etc.
In a professional kitchen there is lots of pre-processing done before any order is taken. Chefs go to storage areas and bring back ingredients to their work areas. They may prepare sauces or dough so that everything is mixed in and prepped so that it can be finished in a short amount of time. They put the results of their work in nearby shelves or buckets for easy access later in the shift.
- This is like getting data from storage, processing it and putting the results in what is called static or working storage, which is accessible by many different subroutines.
There is a storage area and refrigerator that stores meat and another that stores vegetables. The vegetable area might have shelves and bins. The cook goes to the storage area and brings the required ingredients back to the cook’s work space. Depending on the recipe, the cook may also fetch some of the partly prepared things like sauces, often prepared by others, to include.
- This is like getting data from long-term storage and from working storage and bringing it to automatic or local variables just for this piece of work.
The storage area could be nearby. It could be a closet with shelves containing big boxes that have jars and containers in it. A cook is in charge of keeping the pantry full. They go off and get needed ingredients and put them in the appropriate storage area as needed. They could also deliver them as requested right to a chef.
- This is like having long-term storage and access to it completely integrated with the language, or having it be a separate service that needs to be called in a special way.
The chef does the work on the ingredients to prepare the result.
- This is like performing manipulations on the data that is in local variables until the desired result has been produced. In the course of this, a chef may need to reach out and grab some ingredient from a nearby shelf.
The chef may need extra space for a large project. He grabs some empty shelves from the storage area and uses them to store things that are in progress, like dough that needs time to rise. later a chef might call out “grab me the next piece of dough” or “I need the dough on the right end of the third shelf.
- This is like taking empty space and using it. Pointers are sometimes used to reference the data, or object ID’s in O-O systems.
The cook delivers the result for plating and delivery.
- This is like producing a return variable. It may also involve writing data to long-term storage or working storage.
I’m not a cooking professional, but I gather that the work in professional kitchens and how they’re organized has evolved towards producing the best results in the least amount of elapsed time and total effort. As much prep work as done before orders are received to minimize the time and work to deliver orders quickly and well. The chefs have organized work and storage spaces to handle original ingredients (meat, spices, flour, etc.) and partly done results (for example, a restaurant can’t wait the 45 minutes it might take to cook brown rice from scratch).
In the next section, this is all described again somewhat more technically. If you’re interested in technology or have a programming background, by all means read it. The main points of this post and the ones that follow can be understood without it.
Instructions and data in a computer program
The essence of a computer program is instructions that the computer executes. Most of the instructions reference data in some way – getting data, manipulating it, storing results. See this for more.
In math, from algebra on up, variables simply appear in equations. In computer software, every variable that appears in a statement must be defined as part of the program. For example, a simple statement like
X = Y+1
Means “read the value stored in the location whose name is Y, add the number 1 to it, and store the result in the location whose name is X.” Given this meaning, X and Y need to be defined. How and where does this happen? There are several main options:
- Parameters. These form part of the definition of a subroutine. When calling a subroutine, you include the variables you want the subroutine to process. These are each named.
- Return value. In many languages, a called routine can return a value, which is defined as part of the subroutine.
- Automatic or local variables. These are normally defined at the start of a subroutine definition. They are created when the subroutine starts, used by statements of the subroutine and discarded when the subroutine exits.
- Static or working storage variables. These are normally defined separately (outside of) subroutines. They are assigned storage at the start of the whole program (which may have many subroutines), and discarded at the end.
- Allocated variables. Memory for these is allocated by a subroutine call in the course of executing a program. Many instances of such allocated variables may be created, each distinguished by an ID or memory pointer.
- File, database or persisting variables. These are variables that exist independent of any program. They are typically stored in a file system or DBMS. Some software languages support these definitions being included as part of a program, while others do not. See this for more.
There are a couple concepts that apply to many of the places and ways variables can be defined.
- Grouping. Groups of variables can be in an ordered list, sometimes with a nesting hierarchy. This is like the classic Hollerith card: you would have a high-level definition for the whole card and then a list of the variables that would appear on the card.
- There might be subgroups; for example, start-date could be the name of a group consisting of the variables day, month, year.
- Referring to such a variable might look like “year IN start-date IN cust-record” in COBOL, while in other languages it might be year.start-date.cust-record.
- Multiples. Any variable or group can be declared to be an array, for example the variable DAY could be made an array of 365, so there’s one value per day of a year.
- Types or templates. Many languages let you define a template or type for an individual variable or group. When you define a new variable with a new name like Y, you could say it’s a variable of type X, which then uses the attributes of X to define Y.
- Definition scope. Parameters, return values and local variables are always tied to the subroutine of which they are a part. They are “invisible” outside the subroutine. The other variables, depending on the language, may be made “visible” to some or all of a program’s subroutines. Exactly how widely visible data definitions are is the subject of huge dispute, and is at the core of things like components, services and layers.
When you look at a statement like X = Y+1, exactly how and where X and Y are defined isn’t mentioned. X could be a parameter, a local variable or defined outside of the subroutine in which the statement appears. Part of the job of the programmer is to name and organize the data definitions in a clean and sensible way.
The variety of data-instruction organization and relationships
Most of the possibilities for defining variables listed above were provided for by the early languages FORTRAN, COBOL and C, each of which remains in widespread use. Not long after these languages were established, variations were introduced. Software languages and architectures were created that selected and arranged the way instructions related to data definition. Programmers and academics decided that some ways of referencing and organizing data were error-prone and introduced restrictions that were intended to reduce the number of errors that programmers made when creating programs. In software architecture, the idea arose that all of a program's subroutines should be organized into separate groups, usually called "components" or "services," each with its own collection of data definitions. The different components call on each other or send messages to ask for help and get results, but can only directly operate on data that is defined as part of the component.
The most extreme variation of instruction/data relationship is a complete reversal of point of view. The view I've described here is "procedural," which means everything is centered around the actor, the chef who does things. The reversal of that point of view is "object-oriented," called O-O, which organizes everything around the data, the acted upon, the ingredients and workspaces in a kitchen. Instead of following the chef around as he gets and operates on ingredients (data), we look at the data, called objects, each of which has little actors assigned to it, mini-chefs, that can send messages for help, but can only work on their own little part of the world. It's hard to imagine!
The basic idea is simple: instead of having a master chef or ones with broad specialties like desserts, there are a host of mini-chefs called "methods," each of which can only work on a specific small group of ingredients. A master chef has to know so much -- he might make a mistake! By having a mini-chef who is 100% dedicated to dough, and never letting anyone else create the dough, we can protect against bad chefs (programmers) and make sure the dough is always perfect! Hooray! Or at least that's the theory...
Conclusion
Computers take data in, process it and write data out. Inside the computer there are instructions and data. Software languages have evolved to make it easier for programmers to define the data that is read and created and to make it easier to write the lines of code that refer to the data and manipulate it. As bodies of software have grown, people have created ways to organize the data that a computer works on, for example putting definitions for the in-process data of a subroutine inside the subroutine itself or collecting a group of related subroutines into a self-contained group or component with data that only it can work on.
Understanding the basic concepts of instruction/data relationships and how those relationships can be organized and controlled is the key to understanding the plethora of approaches to language and architecture that have been created, and making informed decisions about which language and architecture is best for a given problem. The overall trend is clear: Programming self-declared elites decide that this or that restriction should be placed on which variables can be accessed by which instructions in which way, with the goal of reducing the errors made by normal programming riff-raff. Nearly all such restrictions make things worse!
Comments