There is a striking hierarchy of skills in software, as I've explained here. When you dive into any particular aspect of software, you usually find that it's got a hierarchy all its own. Data science is a subject of intense interest these days, so in this post I'll explain some of the basics of the data science skills hierarchy.
A skills hierarchy is very much an insider's game. What most people care about is status. I talk about the basics of software status here. Remember, the skills hierarchy is a whole world away from the status hierarchy that most people care about. Don't confuse the two!
Data Science skills
First and foremost, it's important to understand the incredibly broad range of subjects covered by the term "data science." I attempt to explain the basics of the range in this blog post. You can be just amazing in one of those subjects, while being a neophyte in one that the outside world may consider to be "related," but which in practice is not just down the hall, but in a different building on a different campus.The general understanding of this range is SO pathetic that "data science" is typically managed as a completely independent, free-standing group. Which makes about as much sense as believing that a sous-chef belongs in something that isn't a kitchen. Or that everything that everyone does in a kitchen is basically the same thing.
Here's one cut at the hierarchy in data science, starting from the base:
- Tool users.These are people who have learned how to use some software tool, or maybe a couple. Most "data scientists" fall into this category.
- They don’t understand how the tools are built or any of the underlying software
- They may know their tool, but aren't real clear on what the other tools are about, much less when you might consider using one.
- Some have broader knowledge of tools
- Very few have the sophistication to understand real data analysis, per my series of AI/ML; see this and the links in it: http://www.blackliszt.com/2018/04/getting-results-from-ml-and-ai-3-closed-loop.html
- Of those, even fewer can understand the underlying algorithms and follow the latest research literature
- Of those, even fewer can make real algorithmic advances and implement them as tools and deliver practical value.
- Of anyone who can do all of the above, it is rare to meet anyone who can address and solve deep tool-level problems in other domains needed to make their code practical in the real world.
- Finally, it is extremely rare to meet people who, in addition to their deep and broad prowess in data science and relevant skills needed to make it real-world practical, have similarly deep skills in an associated domain needed to make the data science fully effective for a business.
The issues of this hierarchy are compounded by the usual over-selling by people who are good promoters but little else, and corporate/government big-wigs who don't want to be bothered with details, but are keen to be seen as "doing something" on such a high-visibility topic. Getting results that make a difference is pretty low on the typical priority list.
And then of course there are the "data scientists" themselves, who most often are sincere people who are trying to do a good job as they've been taught to do it -- mostly by professors and others who have no idea what real-world success looks like, much less how to bring it about.
Finally, there is the usual "manage something that's invisible to you" phenomenon I have often discussed in this blog, which leads to so much dysfunction and so many wonderful Dilbert cartoons.
Conclusion
People talk as though "data science" were a thing, with the usual kind of hierarchy based on level of management and/or "experience." Those typical patterns of hierarchy just don't cut it for understanding what's going on in data science, just like they don't cut it for understanding software development. We will continue to see waste and dead-end efforts until we at least make a start at making our understanding of data science more sophisticated, and aligned with the facts on the ground.
Comments