The message appears to be: if you're not way into Big Data, you're missing out on important things! For vendors and job seekers, I'm sure this is true, without reservation. For the companies that wish to benefit from the investment? Maybe not.
The Big Data Trend is BIG!
There's one thing for sure about Big Data: it's a Big trend.
We have been assured that Big Data is now the driving force in computing.
If you scan through the books, conferences and other things whose focus is Big Data, it's clearly a major fashion trend.
Whenever something like this catches first, everyone wants to jump on. Lots of people talk about their own "big data" that, on a closer look, isn't so big after all.
And generally when you look at it more closely, Big Data doesn't look quite so cool.
Is there something wrong here?
How much better is data that's BIG compared to MEDIUM or SMALL?
The killer assumption behind all the Big Data excitement is that Big is better than normal-size data -- lots better. Makes sense, right?
Not so fast.
Let's spend a little time thinking about the core issues of data coverage, integrity/quality, and probability.
Probability
Today, we've got X data. Let's assume that, with Big Data, we've now got 100X data. Are we 100X better off?
Let's start with something simple and universal: flipping coins. Suppose we place ads. We make money when the coin comes up heads, and lose money when it comes up tails. Our data people tell use that the odds of getting heads are 0.5, with a certainty of 0.1 -- i.e., the chance of it coming up heads is probably 0.5, but it might be as little as 0.4 or as much as 0.6. Now we have 100X more coin flips to apply to our measurement. Great, now we're really going to start marking money!
They come back, sweaty and proud with the answer: the probably of getting heads is 0.500, with certainty of 0.001 -- i.e., the chance of it coming up heads is probably 0.5, but it might be as little as 0.499 or as much as 0.501. Wow, we've increased our level of precision massively! How much does that increase the money we make? Hmmmmm.
Quality/Integrity
Maybe the problem is that we just got lots more data points about the same thing. It didn't broaden our knowledge. Maybe we need to expand, check out the odds for not just nickels, but also dimes and quarters. Hmmmm. Let's get more ambitious. Let's track users, not just on our website, but also on 100 other websites. Tell the programmers to get going! We're going to be rolling in money from Big Data!
The programmers seem to be having trouble matching people over different web sites. Are all these people who claim to be David Black the same person? What about that David B. Black guy? And there appear to be two really different patterns of use coming from the same IP address -- maybe someone else is sharing the computer? And I just discovered that there's a David Black who appears to use the internet from Manhattan, and a David Black who uses it from some place out in New Jersey. We already know there are multiple David Blacks. This could be one person or two. Which is it? This is getting hard.
Darn. I thought all I had to do was get loads more data and a Hadoop cluster and the money would start pouring in. Getting all that data to match up and make sense is harder than it looks. And then, when I've done it, is all I'm achieving increasing my level of certainty about what I already knew?
Data Coverage and lift
Alright. I've got my 100X more data. I've FINALLY sorted it out so it's high quality and matches up. Now I've got to make sure it really broadens my knowledge and gives me uplift in my results.
So far, all I've been doing is looking at my customers' actions. I bet if I look at demographics and social media -- that's lots of data, surely it qualifies as "Big," I'll get better results. Big Data team -- mush!
Darn, darn, DARN! Yeah, all this big data stuff changes what I offer to whom for how much -- but it's not making a whole lot of difference in my results. And I'm getting hammered with complaints from people who want me to stop making offers to their kids, and old customers who wonder why we don't love them anymore. Yeah, we're getting 5-10% uplift, but we're losing at least that much from our old business, not to mention all the costs we've added.
Who's making money from the Big Data stuff? It must be the consultants, the vendors and the conferences. It's sure not me.
Of course, I could just patch it all up, start going to conferences, bragging about how I'm an expert, and maybe I'll get a great new job. But it would be based on a lie. I'm not that kind of person.
Conclusion
I love data. I love exploring it, analyzing it by all available means, and understanding it. Evidence-based solutions are the only ones I'm comfortable with. Everything else is just baseless faith. If I can use math optimization, machine learning or something else to do a better job than a person could do, I'm all for it. If I can get additional data and that data will help me get better results, bring it on!
But "Big Data" is not in principle better than "enough" data. Too little is not enough. More than you need to get the job done is a waste. Just like Goldilocks, you should want the amount of data that's "just right."
Comments