Starting fairly early in the years of building software programs, some programs grew large. People found the size to be unwieldy. They looked for ways to organize the software and break it up into pieces to make it more manageable. This effort applied to the lines of code in the program and to the definitions of the data referred to by the code, and to how they were related. How do you handle blocks of data that many routines work with? How about ones whose use is very limited? The background of this issue is explained here.
These questions led to a variety of efforts that continue to this day to break a big body of software code and data definitions into pieces. The obvious approach, which continues to be used today, was simply to use files in directories. But for many people, this simple, practical approach wasn't enough. They wanted to make the walls between subsets of the code and data higher, broader and more rigid, creating what were generally called components. The various efforts to create components vary greatly. They usually create a terminology of their own, terms like “services” and “objects.” Such specialized, formulaic approaches to components are claimed to make software easier to organize, write, modify and debug. In this post I’ll take a stab at explaining the general concept of software components.
Components and kitchens
When you’re young and in your first apartment, you’ve got a kitchen that starts out empty. You’re not flush with cash, so you buy the minimum amount of cooking tools (pots, knives, etc.) and food that you need to get by. You may learn to make dishes that need additional tools, and you may move into a house with a larger kitchen that starts getting filled with the tools and ingredients you need to make your growing repertoire of dishes. You may have been haphazard about your things at the start, but as your collection grows you probably start to organize things. You may put the flour, sugar and rice on the same shelf, and probably put all the pots together. You may put ingredients that are frequently used together in the same place. You probably store all the spices and herbs together. It makes sense – if everything were scattered, you’d have a tough time remembering where each item was. The same logic applies to a home work bench and to clothes.
The same logic applies for the same reason to software! It’s a natural tendency to want to organize things in a way that makes sense – for remembering where they are, and for convenience of access. This natural urge was recognized in the early days of software programming. Given that there aren’t shelves or drawers in that invisible world, most people settled on the term “component” as the word for a related body of software. The idea was always that instead of one giant disorganized block of code, the thing would be divided into a set of components, each of which had software and data that was related.
A software program consists of a set of procedures (the actions) and a set of data definitions (what the procedures act on). Breaking up a large amount of code into related blocks was helped by the early emergence of the subroutine – a block of code that is called on to do something and then returns with a result; kind of like a mixer in a kitchen. The problems that emerged were how to break a large number of subroutines into components (each of which had multiple subroutines), and how to relate the various data definitions to the routine components. This problem resembles the one in the kitchen of organizing tools and ingredients. Do you keep all the tools separate from the ingredients, or do you store ingredients with the tools that are used to process them?
In the world of cooking, this has a simple answer. If all you do is make pancakes, you might store the flour and milk together with the mixing bowls and frying pans you use with them. But no one ever just makes pancakes – duh! And even if you did, you’d better put the milk and butter in the fridge! Ditto with spices. Nearly everyone has a spice drawer or shelf, and spices used for everything from baking to making curries are stored there. Similarly, you store all the pans together, all the knives, etc.
In software it’s a little tougher, but not a lot. One tough subject is always do you group routines (and data) together by subject/business area or by technology area? The same choice applies to organizing programmers in a department. It makes sense for everyone who deals primarily with user interfaces to work together so they can create uniform results with minimal code. Same thing with back-end or database processing. But over time, the programmers become less responsive to the business and end users; the problem is often solved by having everyone who contributes to a business or kind of user work as a team. Responsiveness skyrockets! Before long, things begin to deteriorate on the tech side, with redundant, inconsistent data put into the database, UI elements that varying between subject areas, confusing users, etc. There’s pressure to go back to tech-centric organization. And so it goes, round and round. Answer: there is no perfect way to organize a group of programmers!. Just as painfully, there is no perfect way to organize a group of procedures and their relationship to the data they work on!!
The drive towards components
There are two major dimensions of making software into components. One dimension is putting routines into groups that are separate from each other. The second dimension is controlling which routines can access which data definitions.
Separating routines into groups can be done in a light-weight way based on convenience, similar to having different workspaces in a kitchen. In most applications there are routines that are pretty isolated from the rest and others that are more widely accessed. Enabling programmers to access any routine at any time and having access to all the source code makes things efficient.
Component-makers decided that programmers couldn't be trusted with such broad powers. They invented restrictions to keep routines strictly separate. While there were earlier versions of this idea, a couple decades ago "services" were created to hold strictly separate groups of routines. An evolved version of that concept is "micro-services." See this for an analysis.
The second major dimension of components is controlling and limiting the relationship between routines and data. The first step was taken very early. It was sensible and remains in near-universal use today: local variables. These are data definitions declared inside a routine for the private use of that routine during its operation. They are like items on a temporary worksheet, discarded when the results are created.
Later steps concerning "global" variables were less innocent. The idea was to strictly separate which routines can access which data definitions. Early implementations of this were light-weight and easily changed. Later versions built the separation into the architecture and code details. For example, each micro-service is ideally supposed to have its own private database and schema, inaccessible to the other services. This impacts how the code is designed and written, increasing the total amount of code, overhead and elapsed time.
Languages that are "object oriented" are an extreme version of the component idea. In O-O languages, each data definition ("class") can be accessed only by an associated set of routines ("methods"). This bizarre reversal of the natural relationship between code and data results in a wide variety of problems.
These ideas of components can be combined, making the overhead and restrictions even worse. People who build micro-services, for example, are likely to use O-O languages and methods to do the work. Obstacles on top of problems.
Components in the kitchen
All this software terminology can sound abstract, but the meaning and consequences can be understood using the kitchen comparison.
What components amount to is having the kitchen be broken up into separate little kitchenettes, each surrounded by windowless walls. There are two openings in each kitchenette's walls, one for inputs and one for outputs. No chef can see or hear any other chef. The only interaction permitted is by sending and receiving packages. Each package has an address of some kind, along with things the sending chef wants the receiving chef to work on. For example, the sending chef may mix together some dry ingredients and send them to another chef who adds liquid and then mixes or kneads the dough as requested. The mixing chef might then put the dough into another package and return it to the sending chef, who might put together another package and send it to the baking chef who controls the oven.
If the components are built as services, the little kitchenettes are built as free-standing buildings. In one version of components (direct RESTful calls), there is a messenger who stands waiting at each chef's output window. When the messenger receives a package, the messenger reads the address, jumps into his delivery van with the package and drives off to the local depot and drops off the package; another driver grabs the package, loads it into his van and delivers it to the receiver's input window. If the kitchenettes are all in the same location the vans and depot are still used -- the idea is that a genius administrator can change the location of the kitchenettes at any time and things will remain the same.
Another version of components is based around an Enterprise Service Bus (ESB). This is similar to the vans and the central office depot except that the packages are all queued up at the central location, which has lots of little storage areas. Instead of a package going right to a recipient, it's sent to one of these little central storage areas. Then, when the chef in a kitchenette is ready for more work he sends a requests to the central office, asking for the next package from a given little storage area. Then a worker grabs the oldest package, gives it to a driver who puts it in his van and delivers the package to the input window of the requesting chef.
If this sounds bizarre and lots of extra work, it's because ... it's bizarre and requires lots of extra work.
The ideal way to organize software
The ideal way to break software into components is actually pretty similar to the way good kitchens are organized. Generally speaking, you start by having the same kinds of things together. You probably store all the pots and pans together, sorted in a way that makes sense depending on how you use them. You probably pick a place that’s near the stove where they’ll probably be used – maybe even hanging on hooks from the ceiling. You probably have a main store of widely used ingredients like salt, but you may periodically put containers of it near where it is most often used. An important principle is that most chefs work at or near their stations – but (it goes without saying) can move anywhere to get anything they need. There aren't walls stopping you, and you don’t bother someone else to get something for you when you can more easily do it yourself.
Exactly the same principle applies whether you are creating an appetizer, an entree, a side dish or a dessert -- you do the same gathering and assembly from the ingredients in the kitchen, but deliver the results on different plates at different times.
In software this means that while routines may be stored in files in different directories for convenience, and that usually your work is confined to a single directory, you go wherever you need to in order to do your job. Same thing with data definitions; you can break them up if it seems to make things more organized, but any routine can access any data definition it needs to. When you’re done, you make a build of the system and try it out. That's what champion/challenger QA is for.
Are you deploying on a system that has separate UI, server and storage? No problem! That's what build scripts (which you would have anyway) are for! This approach makes it easier to migrate from the decades-old DBMS-centric systems design and move towards a document-centered database with auto-replication to a DBMS for reporting, which typically improves both performance and simplicity by many whole-number factors.
Are the chefs in your kitchen having trouble handling all the work in a timely way? In the software world you might think of making a "scalable architecture" with separate little kitchenettes, delivery vans, etc. -- which any sensible person knows adds trouble and work and makes every request take longer from receipt to ultimate delivery. In the world of kitchens (and sensible software people) you might add another mixer or two, install a couple more ovens and hire another chef or two, everyone communicating and working in parallel, and churning out more excellent food quickly, with low overhead.
If you think this sounds simple, you’re right. If you think it sounds simplistic, perhaps you should think about this: any of the artificial, elaborate, rigid prescriptions for organizing software necessarily involves lots of overhead for people and computers in designing, writing, deploying, testing and running software. Each restriction is like telling a chef that under no circumstances is he allowed to access this shelf of ingredients or use that tool when the need arises.
Arguments to the contrary are never backed with facts or real-world experiments – just theory and religious fervor. Instead, you should consider the effective methods of optimizing a body of software, which are centered on Occamality (elimination of redundancy) and increasing abstraction (increasingly using metadata instead of imperative code). Both of these things will directly address the evils that elaborate component methods are supposed to cure but don't.