Getting practical, real-world results with ML and AI isn’t just a matter of hiring some people with the right credentials and throwing them at the project. Most such efforts start with fanfare but then fade into failure, usually quietly. The first part of this series laid out the issues, described the path to success, and concentrated on the indispensable foundation of success, the data. Data that has to be collected, corrected, enhanced and augmented – a time-consuming process that has no “advanced algorithm” glory, but MUST be done, and done well.

In this post, we’ll concentrate on the analytic methods that a successful project uses to generate value from the data so arduously collected and corrected.

**A Little Background**

I say I’m focused here on ML and AI. I just said that because it’s what everyone is talking about. What I’m *really* focused on is algorithms for understanding and getting value out of data. So I lied. Even worse, I’m not sorry – because just *thinking* that what’s important is to use the latest ML and AI techniques is central to the failure of most such efforts to deliver value.

I guess I can get over my programmer-ish prissiness that things are getting new names. What I refuse to get over is that lots of important, really valuable techniques are usually left out of the grab-bag of “ML and AI.” I won’t be comprehensive, but I think a glance at the landscape might help here.

There are a couple different ways to understand useful algorithms and how they came to be. Roughly, they are:

- Follow the algorithm, taking a fuzzy lens for the naming and details
- Follow the academic departments that “own” the algorithm
- Follow the problems the algorithm has proven to be good for

These ways overlap, but provide useful angles for understanding the algorithms and where to find them.

Let’s illustrate this with an amazing, powerful algorithm that is usually sadly ignored by people who are into ML and AI. It’s most often called linear programming (LP).Those who are into it think of it as being one of a category of algorithms called mathematical programming. More broadly, it’s normally “owned” by academic departments of operations research (OR). OR studies repeating operations like responding to repairs for appliances or controlling the output of oil refineries when prices and costs vary and optimizes the results. It’s been used for decades for this purpose in many industries, and is being rolled out today to schedule infusion centers and operating rooms in hospitals.

This isn’t the place to spell it all out, but knowledge of amazing algorithms like LP is scattered over departments of Engineering, Computer Science, Math, Operations Research, Statistics, AI and others. The point is simple: *the world of useful algorithms and modeling techniques is vastly greater than ML and AI.*

**The Natural Sequence**

There are dozens and dozens of methods that can be used to analyze and extract value from data, which after all is the point of ML and AI (and, by implication, *all the other great algorithms*). As I described in the prior post, there is a natural progression or sequence of methods, which roughly follows their order of discovery and/or widespread use. Success usually comes from using the methods in the rough order of their invention as you climb the mountain of understanding from simple and obvious (in retrospect) results to increasingly non-obvious and subtle results.

I often see the following reaction to this concept, rarely articulated but often acted upon: “Why would I want to waste everyone’s time playing around with obsolete, outdated methods, when I’m an expert in the use of the most modern ML and/or AI techniques? I’m sure that my favorite ML technique … blah, blather, gobbledygook … will yield great results with this problem. Why should I be forced to use an ancient, rusting steam engine when I’m an expert in the latest rocket-powered techniques, ones that will zoom to great answers quickly?!”

The unspoken assumption behind this modern-sounding plea is that analytical techniques, ranging from simple statistics and extending to the latest ML, are like computers or powered vehicles. With those things, the latest versions are usually WAY better than prior versions. You would indeed be wasting everyone’s time and money if you insisted on using a personal computer from the 1980’s when modern computers are many thousands of times better and faster.

The trouble with this line of thinking is simple: *the metaphor is inapplicable. It’s wrong!* Analytic techniques are NOT like computers; they are like math, in which algebra does not make simple math obsolete – algebra assumes simple math and is built on it. Calculus does not make algebra obsolete – calculus assumes algebra and is built on it! And so on. Each step in the sequence is a refinement that is built on top of the earlier one. No one says, now that I know calculus, I refuse to do algebra because it’s old and obsolete. See this for more on this subject.

So it **does** make sense to quickly apply simple methods to the data to get simple answers, and at the same time vet your data. No time is wasted doing this. On the other hand, if you jump straight to someone’s favorite ML technique, not only is it likely that inaccurate and incomplete data will render the results useless … *you won’t even know anything is wrong!* Because most ML techniques do nothing to reveal problematic data to the researcher, while simpler methods often do!

**Fundamental Analytical Concepts: Calculate it methods**

The simplest and most useful methods are ones in which you simply calculate the answer. There’s no modeling, no training, no uncertainty. These methods are highly useful for both understanding and correcting the data you’ve got. The basic methods of statistics like regression apply here, and so do the methods of data organization and presentation usually called OLAP, BI and dimensional analysis. The tools associated with a star schema in a DBMS apply here, which are roughly the same as pivot tables in Excel.

Graphing and visualization tools are important companions to these methods; they help you really understand the numbers and see to what extent they make common sense and match reality. For example, you can see to what extent a doctor’s years of experience correlate with ordering tests or issuing prescriptions of a certain kind; or simply identify the doctors whose actions stand out from the rest. There could be a good reason why they stand out; wouldn’t you like to find out why? Maybe the doctor should be emulated by others, or maybe the doctor should be corrected; either way, you should figure it out.

Until you’ve pursuing all lines of thinking based on these simpler methods, it’s premature to move on.

**Fundamental Analytical Concepts: Solve/Optimize it methods**

These are, IMHO, the gold standard of algorithmic improvement. When applicable, they tell you how to reach a provably optimal result! No training. It takes experience and judgment to apply the generic algorithms to a particular problem set, and sometimes the problem needs to be adjusted. But the results are stellar.

First, you create an equation that measures what you’re trying to optimize. Is it fastest time? Lowest cost? Least waste? Some combination? Whatever it is, that’s what you’ll maximize or minimize as the case may be.

Next, you determine the constraints. You only have so many operating rooms? This kind of machine failure requires a repairman with that kind of skills? Then you put in the inputs and solve. While I’m leaving out lots of detail, that’s the basic idea.

These methods, usually of the OR kind, have been applied with great success for decades. In certain fields and industries, they are part of the standard operating procedure – it would be unprofessional to fail to apply them. And you would rapidly lose to the competition.

**Fundamental Analytical Concepts: Train it methods**

The training methods all require sample data sets on which to “train” the model. Selecting and controlling the data set is key, as is avoiding over-training, in which the trained model can’t generalize what it’s been trained on, and thus loses most of its utility.

**Fundamental Analytical Concepts: Train it methods: white box**

What characterizes these methods is something incredibly important: what the model does can generally be explained in human-understandable terms, i.e., it’s “white box.” This has huge value, if only to gain acceptance for what the model does – but it may also bring up problems with data that can lead to further improvements.

There are lots of ML algorithms that are in this category. All the decision tree methods are here, among them the very important random forest method, along with methods that arose within the field of statistics such as CART.

**Fundamental Analytical Concepts: Train it methods: black box**

These methods can produce amazing results, and should be used whenever necessary, i.e., whenever earlier methods in the sequence can’t be used. The fact that the model is “black box” means that it’s difficult if not impossible to understand how the model makes its decisions in human terms – even for an expert.

These methods include neural networks in all forms, including all the variations of “deep learning.”

**Fundamental Analytical Concepts: Rules**

Finally, I add the indispensable attribute of success in many practical systems: human-coded rules. These can be inserted at any point in processing, as early as enhancing the data before any methods work on it, and as late as modifying the results of final processing. While not often explicitly discussed, few practitioners with successful track records avoid the use of rules altogether. They may not be pretty or fancy or elegant – but they work, darn it.

More elaborate than sets of rules is the technique in AI of expert systems. This is a whole big subject of its own. Generally speaking, if you can get useful results from one of the sequence of methods up to and including white box training systems, you should do so. But important categories of problems can only be solved using expert systems, which ideally should be as white box as possible.

**Conclusion**

There is a broad range of analytic techniques that can be applied to a given problem. There is an optimal sequence for understanding the data and the problem. Going from one step in the sequence to the next, when done correctly, isn’t abandoning a method for something better, but first picking the low-hanging fruit and then moving on to catch tougher stuff. Prejudging the best technique to use before really getting your hands dirty is a mistake. Being a specialist in a particular method, e.g. “deep learning,” and confining your activities to that method alone can get you hired, paid and busy, but may lead to no useful results, or results far less useful than they could be.

## Comments