Big Data is huge. Everybody wants it. If you're not doing it, you're hopelessly antiquated. But it has serious flaws. The high-profile role played by Big Data in the recent election provides an excellent example. Calling those efforts a "face-plant" is kind. In addition to illustrating many of the glaring flaws I have previously enumerated, this face-plant clearly and explicitly demonstrates the corrosive effects of bias: the experts weren't seeking the truth -- they were rooting for an outcome. Given the undeniable predictive failure, you'd think a little self-reflection might be in order. This post uses the recent election Big Data failure as an example. The flaws it illustrates, and others, are common in Big Data efforts, and are the reason why so many of the much heralded efforts result in no substantial benefit.
The Big Data Experts
In recent years, Big Data election experts have attained great visibility. Their pronouncements are more closely followed than those of the candidates themselves. Nate Silver has been the reigning god, but a new one exploded onto the scene this election season. Here's the story as it appeared in Wired Magazine, just days before the election:
The story got serious attention, as you can see from more than 24,000 Facebook shares. How big is this guy and organization? Real big:
Who is this guy? Read on:
Clearly a massive math and science wonk. No one else gets into CalTech, much less gets a Stanford PhD in science.
What did he say about the election? Of course the picture changed as election day drew close, but all the math pointed strongly to a Clinton victory.
The debate as the election drew close was interesting. It wasn't whether Clinton would win -- everyone thought she would -- but since they're math guys and they know this isn't physics, they argued about the probability she would win, and about the margin predicted.
Dr. Wang ratcheted up the probability of a Clinton win all the way up to 99%. That's pretty darn certain! Here's his argument for why such certainty was reasonable:
Yup, it was sure a giant surprise, all right!
Here is his description of his calculations and why they're reasonable, if you can stand it. If not, that's OK, just skip ahead:
There's lots more stuff on the site. By all means check it out for a great example of self-delusion by a celebrated Professor Doctor. Here is a sample:
For any readers who actually know math and science, you'll know right away that this is a specious argument: it's a lot of words that are math-y, but they bear no real relationship to the actual probability of Clinton winning.
Late afternoon of election day, he posted his last prediction:
This was not a search for truth
How could Professor Doctor Neuroscientist Sam "Election Hero" Wang have gotten it so wrong? In addition to committing many of the standard errors and unusually bad interpretations of probability I've mentioned, there's another reason: Wang was not seeking truth. Dr. Wang was an advocate. He badly wanted an outcome. He wasn't predicting for prediction's sake -- he was predicting to find out which races were close, so that scarce funds could be allocated to sway the outcome of those close races. How do we know? Here are Wang's own words in that same final post, which he repeats with emphasis in the comments:
This also explains how he got famous -- he was drizzling science-y pixie-dust on the outcome that he and many other people wanted. He told them what they wanted to hear.
Could it be that Dr. Wang has an unblemished track record of prior predictions, and let his emotions get the best of him in the 2016 election? Sadly, no. Look at this powerful -- 98% probability! -- prediction, his final one before the 2004 election:
What we've got here is an advocate posing as a scientist, spouting out what his fans want to hear with lots of math-geek talk to make it sound solid, but who gets it badly wrong. Repeatedly. Surely, all right-thinking people would turn their backs on him, right? Science is about making predictions that come true, and if your predictions are wrong, you're just a promoter with no credibility, right?
There is clearly an audience for people who tell readers what they want to hear with math-y icing on top.
The Big Data juggernaut rolls along, its momentum unabated. The face-plant of Big Data analytics in the 2016 should have been a wake-up call, regardless of your political views, of the inherent dangers and deep biases that send all too many Big Data efforts into the gutter of failure. Everyone appears to have moved on unchanged, which makes sense, because it was never really about science and truth to begin with. It's sad to see exotic BIG data efforts getting lots of money and attention, when humble LITTLE data efforts are causing daily pain but starved for funding. See this. However, if you want to get value out of Big Data and associated technologies, be assured that it can be done. Just take this story as another note of caution.