William of Occam (ca. 1285 to 1349) was an English logician and Franciscan friar. He is credited with formulating a principle that has been applied to various aspects of computer systems. In those areas of computing to which it has been applied, it reigns supreme – it supplies optimal solutions to the relevant problems.
There are large areas of computing to which Occam’s razor has not been applied. Worse, it is not even one of the candidates under consideration. As a result, those aspects of computing are fractured, inefficient, unpredictable, and driven by fashion and politics.
Everyone involved knows that the whole process of specifying, designing, building, testing and supporting software is hopelessly inefficient, unpredictable and error-prone. The leading methodology that concentrates on the process of building software, “project management,” is theoretically bankrupt and in any case has an empirical track record of failure. In terms of the content of good software, there are fierce battles among competing approaches, none of which is anything but a collection of unsupported and unfounded assertions, and which in practice don’t contribute to building good software.
Occam’s razor leads to the principles on which good software may be built, and supplies a single simple, widely applicable theme that, when applied, cuts away (as a razor should) all the inefficiency and generally what we don’t like about software. Once the principle is understood and widely applied, It should be the undisputed standard for how software is built, just as it has in the other areas of computing to which it has been applied.
Here is a shorter attempt to explain the Razor and its application to software. Here is an approach to the same problem from a common-sense layman's point of view.
The razor itself
“Occam’s razor” is a famous principle of thinking from the European Middle Ages. Occam’s razor is applied in situations where there is more than one reasonable explanation for a phenomenon, and basically says that you should pick the simplest explanation consistent with the phenomena you observe. From Wikipedia:
Leonardo da Vinci (1452–1519) lived after Occam's time and has a variant of Occam's razor. His variant short-circuits the need for sophistication by equating it to simplicity.
Simplicity is the ultimate sophistication.
Occam's Razor is now usually stated as follows:
Of two equivalent theories or explanations, all other things being equal, the simpler one is to be preferred.
As this is ambiguous, Isaac Newton's version may be better:
We are to admit no more causes of natural things than such as are both true and sufficient to explain their appearances.
In the spirit of Occam's Razor itself, the rule is sometimes stated as:
The simplest explanation is usually the best.
Other applications of Occam’s razor
While it plays little overt role in intellectual debates, in fact Occam’s razor and concepts derived from it are central to scientific thinking. So while you may not have ever heard of it, or may only vaguely remember having heard it in the past, you shouldn’t think that it’s an obscure little Medieval tidbit that has no relevance to the present that this guy is just pulling out for some weird reason. If you go through the Wikipedia reference, you find that:
Occam's Razor has become a basic perspective for those who follow the scientific method. … without the principle of Occam's Razor science does not exist. The primary activity of science, formulating theories and selecting the most promising theory based on analysis of collected evidence, is not possible without some method of selecting between theories which do fit the evidence. This is because, for every set of data, there are an infinite number of theories which are consistent with those data (this is known as the Underdetermination Problem).
You can find Occam’s razor in detail in astronomy, physics, biology, and medicine. It even appears explicitly in statistics.
There are various papers in scholarly journals deriving versions of Occam's Razor from probability theory and applying it in statistical inference, and also of various criteria for penalizing complexity in statistical inference. Recent papers have suggested a connection between Occam's Razor and Kolmogorov complexity.
It should be evident that Occam’s razor is an important underpinning of the whole scientific enterprise.
Applications of Occam’s razor in computing
We already understand Occam’s razor clearly in information theory, in which we measure the information content of a transmission. Information theory has provided the theoretical foundation of all communications and data transmission since its invention by Claude Shannon in 1948.
Occam’s razor is particularly applied in minimum message length (MML) theory, which focuses on the least number of bits required to encode a given amount of information.
MML has been in use since 1968. MML coding schemes have been developed for several distributions, and many kinds of machine learners including: unsupervised classification, decision trees and graphs, DNA sequences, Bayesian networks, Neural networks (one-layer only so far), image compression, image and function segmentation, etc.
While the terminology is not normally used, and I am not aware that Occam’s razor was explicitly used to formulate it, it is clear that the principles are clearly expressed in modern relational database theory, and in the practice of schema design. I suggest that is one of the reasons for the success of the DBMS approach to data storage and access.
Shannon’s information theory
Shannon’s application of the concept to information theory provides a good springboard to seeing how it applies to software design. So let’s understand information theory a little better. I don’t think I can do better than Wikipedia. Here is a snapshot of the article on the history of information theory:
Claude E. Shannon (1916–2001) founded information theory with his classic paper "A Mathematical Theory of Communication," published in the Bell System Technical Journal in July and October of 1948. At the beginning of his paper, Shannon asserted that "The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point." His theory for the first time considered communication as a rigorously stated mathematical problem in statistics and gave communications engineers a way to determine the capacity of a communication channel in terms of the common currency of bits. This problem is called the channel coding problem. The transmission part of the theory is not concerned with the meaning (semantics) of the message conveyed.
A second set of ideas in information theory relates to data compression. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data. There are two formulations for the compression problem -- in lossless data compression the data must be reconstructed exactly, whereas lossy data compression examines how many bits are needed to reconstruct the data to within a specified fidelity level. This fidelity level is measured by a function called a distortion function. In information theory this is called rate distortion theory. Both lossless and lossy source codes produce bits at the output which can be used as the inputs to the channel codes mentioned above.
This division of information theory into compression and transmission is justified by the information transmission theorems, or source-channel separation theorems that justify the use of bits as the universal currency for information in many contexts. …
Communications channels have a fixed capacity for sending information per unit of time. To use that capacity as efficiently as possible, you distinguish between the message that is presented for sending and the actual information content of that message. What this amounts to is saying that the “information content” of a message is the smallest number of bits required to exactly reproduce the message after transmission. In plain language, the “information content” of a message is the message compressed as much as possible, with all repeats and redundancy removed, except for redundancy purposely introduced for error identification and correction.
Like any application of Occam’s razor, the focus on information content comes out of knowing what you care about; you include everything required to get what you care about and throw out everything else. What I care about in communications is accuracy and efficiency. Efficiency means that I should transmit the most information through a communications channel of given capacity as possible. This can be achieved by using the smallest possible number of bits to encode the message.
The application of Information theory to software
How is all this relevant to building software? Aren’t we supposed to use structured design, use cases, components, or whatever to design software? Most of those techniques focus on the process of a good design, or a supposed ideal structure for a program. If we apply the same thinking that we see in information theory to software design, we will focus instead on the results of the process, and once we are satisfied we know how to judge whether a computer program is optimal or not, we will be able to find ways to build it.
Software specification and design has clearly been an island of technical art, separated from science and technology as a whole, except for those portions that have already come under the sway of “Occamal” thinking such as information and MML theory. Like any relatively isolated island of theory and practice, a wide variety of techniques and practices arise to fill the gap left by a completely baseless, ad-hoc approach. There are innumerable approaches to software design and building and no clear way to decide among them.
When designing software, we care most about things that are a very close analogy to what we care about in communications. In communications, we have a channel of given capacity, and want to squeeze the most information through it we can; therefore, we eliminate all redundancy from the original message, and transform it to contain only its pure information content. Anything extra would take transmission capacity and add nothing. In software design, we normally have capacity to build a representation of the program (source code) that is as long as we would like in terms of the capacity of the computer to execute the code. However, every line of code adds a burden to one or more stages of the whole chain of specify, design, code, test, document, train, learn, use, maintain, modify, and repeat. This chain of effort, the entire lifecycle of the program and everyone who touches it, is like the communications channel in information theory. The idea is that most program specifications are highly redundant, like messages in their original form. We want to define the “information content” of programs just like we define the information content of messages, so that every redundant or repeating group is eliminated, and what is left is everything required to make the program operate and absolutely nothing else. We focus on information content in communications because we want to make the best use of fixed communications capacity; we focus on information content in software because we want to make the best use of all the resources that are involved with the software in any way.
We don't just want to make programs short for general reasons. We also want to make them easy to change, with as little effort and error as possible. By eliminating all redundancy from a program, we make it so that there's exactly one place to go to make a change.
Getting back to Occam (though I admit Shannon and fancy formulas are more impressive and intimidating), if Occam’s razor is:
Entia non sunt multiplicanda praeter necessitatem.
No more things should be presumed to exist than are absolutely necessary.
Occam’s razor applied to software would be:
No more software entities should be created than are absolutely necessary.
Why? Just as sending bits beyond the information content of the message has a cost but doesn’t improve the message that is received, so do additional entities in the specification or expression of the program add to the cost to build and the cost to change without increasing the value of the program.
This may not sound dramatic or exciting. But as someone who has lived with the process of building software for a long time, and has struggled extensively with the question of what makes software “good,” it’s very satisfying to have an objective criterion that enables you to judge the “goodness” of a program, and to say when a program is “optimally” good. It gets exciting when you realize that this abstract notion of optimality has consequences that are extremely practical and down-to-earth. In particular, it shows you a path to building programs more quickly than using any other method; doing less to build the programs than you thought you had to do; and having the resulting programs be as easy and safe to change as it is possible for programs to be.
Comments