We hear quite a bit these days about ML (machine learning) and AI (Artificial Intelligence), as the drumbeats of Big Data and Analytics fade away. You’ve got to be using these things to get great results and transform your business! You’d better round out the staff of Data Scientists you hired last year, and add appropriate numbers of ML and AI experts to the mix. Otherwise, you’re hopelessly behind the times, and you’ll eat the dust generated by the winners!
Most efforts to apply these technologies fail. Not loudly, of course – no one admits failure. But after enough time passes with exciting results being right around the next corner, people stop talking about catching a glimpse of light from the end of the tunnel, and accept the fact that they’re digging a tunnel deeper and deeper, a tunnel to nowhere.
Is there a way to get amazing value out of these exotic technologies? Yes. Decades of solid results show the way.
Typical Failure Patterns
In real-life cases, what happens all too often is that “data scientists” are called in to apply their magic. They take the available data and apply their favorite techniques. They may not produce results that are promising. If the results are promising, there is trouble applying them to the real-life situation. Or if they can somehow be applied, they don’t work or produced the promised results.
The failures are the direct result of taking a naïve approach to applying these kinds of techniques in the real world. There are proven methods for attaining good results, but those methods are rarely discussed, for reasons I don’t understand. If they’re discussed, they’re lightly brushed over – instead of being given center stage as they deserve. Following are the ways to be successful with ML and AI.
Build from the bottom up
No one tries to sell a house based on how solid its foundation is; but a house with a crappy foundation will collapse. No one brags about their arithmetic skills when trying to get a job as an ML expert; but if you can’t add, how can you do ML? There is a clear sequence of learning ML and AI. You start with learning how to count; if you don’t know the numbers, you know nothing. Then you learn basic manipulation of numbers, like addition and multiplication. Then it’s on to algebra and the following stages. You don’t try to learn exotic techniques until you’ve mastered the more basic ones on which they’re based, and on which they depend. These are things that are nearly universally accepted.
What’s not so common is applying the same sequence when analyzing any particular problem. The foundation of the sequence, the first step, is the numbers, the data. All too often, the data is incorrect and incomplete. If the data is bad, the results will be worthless.
I’ve never found anyone who disputes this, once the issue is raised. But I also rarely find people who act on it, and take it seriously. Why? Among other reasons, it's a multi-trillion dollar issue.
I have worked with some of the most advanced people in the field, including someone who’s been the chairman of one of the top academic departments in the field. This person and his methods have been behind a few of the most widely used success stories. Here’s his “secret” for when he dives into a new problem: he loads the data into Excel and looks at it, first line-by-line, and then using functions and visualizations. Yes, I know, Excel is something accountants use and try to avoid. But it’s ideal for fast, visual analysis of data sets, and has some of the most advanced algorithms available as add-in libraries. Why would you start programming in Python when you can move quickly mostly without programming using a tool?
The important thing isn’t the tool. The important thing is the activity – look at the data. Seriously look at it! Don’t just scan and move on. Understand it! What you’ll almost always find is that there are errors. Mistakes. Important stuff that’s missing. Graph it and look for basic correlations, and see if it makes sense. Make sure a true subject-matter expert is by your side.
Then the fun begins. You have to fix the data. It’s the foundation of everything else. Without a solid foundation, nothing of value can be built on it.
It’s also important to understand that this isn’t something mechanical, like spell-checking. What you often find is that really crucial data is missing, or that real important data can be added. This simple-sounding fact can be a project-maker. I have been involved with several projects in which the mundane-sounding effort of adding more relevant data has been the difference between failure and incredible, world-changing success.
OK, your data’s pretty good. Time to dive into ML? No way!! Way too soon! We’re going to go in sequence here, applying the very most basic techniques to the data first.
The point of all this is simple: you squeeze all the value you can out of a given “level” of technique before advancing to the next one. There are lots of reasons why this makes sense. The simpler techniques yield results more quickly than fancier ones. They tend to be larger and more obvious, which means the impact will be big. People will understand them, and so are more likely to buy into the changes needed to apply them in real life.
I’m not going to spell out all the techniques you should apply and the proper sequence, but generally speaking, the order is the same as the order in which the techniques were discovered historically, which is the roughly the same order in which the techniques are taught in school. So you try simple linear regression before multi-variate, for example. And you always look and use visual methods, because a surprising number of the advances are often ones that people in the field know or expect, or that at least “make sense” to them.
Finally, at long last, you get to use the fancy stuff you’ve been itching to use all along. But by then … your data’s in great shape. Your system is already up and running. People are already accustomed to change and the improvements that result from applying math. And interestingly, there may not be that much “juice” to be squeezed out of the system by then. Depending on your scale, that remaining juice may be tasty indeed, but it’s the icing on the cake.
In applying AI, the pattern is the same, except that in addition to applying simpler analytic techniques, you may be writing common-sense-understandable rules by hand. Why not? It gets the job done, it’s simple and direct, and the AI can focus on that yummy icing.
Other Issues
We’re not done! There are a couple other major, overriding issues to be considered in order to get great results from these advanced methods. I’ll cover them in future posts. They are:
- Most people have a favorite method, in which they have expertise and experience. That’s wonderful, except that there is a world of different methods, and many of them are simple inapplicable to certain kinds of problems – but great on others. Picking the right method (or methods) is absolutely key to success.
- Closed loop. All too many projects run open loop. In my descriptions above, I sneakily assumed closed loop – that’s where the best feedback comes from.
Conclusion
Wonderful results can be obtained by applying modern analytic methods to real-world problems. But you have to choose: do you want an academic prize or do you want real-world improvements? Sadly, those goals don’t have loads of overlap. If you want real-world results, you should build your effort on a solid foundation of accurate and complete data, and move from simple to increasingly refined as you apply algorithms to it. If you do, you’ll see positive results fairly quickly, and those results will get better and better as you climb up the mountain of sophistication.
Comments