Today’s blog post will be a little on the light side as I explore the various things that come up in my experience working as a data scientist.
I’d like to consider myself to have a fairly solid understanding of statistics*I would think it's accurate to say that I may be slightly above average in statistical understanding compared to the rest of the population.. A very large part of my work can be classified as stakeholder management - and this means interacting with other people who may not have a strong statistical foundation as I have. I’m not very good at it in the sense that often people think I am hostile when in fact all I am doing is questioning assumptions*I get the feeling people don't like it but you can't get around questioning of assumptions..
Since the early days of my work, there’s been a feeling that I’ve not been able to put to words when I dealt with stakeholders. I think I finally have the words to express said feelings. Specifically it was the transference of tacit knowledge that bugged me quite a bit.
Consider an example where the stakeholder is someone who’s been experienced in the field for quite sometime. They don’t necessarily have the statistical know-how when it comes to dealing with data, much less the rigour that comes with statistical thinking. More often than not, decisions are driven by gut-feel based on what the data tells them. I call these sorts of processes data-inspired (as opposed to being data-driven decision making).
These gut-feel about data can be correct or wrong. And the stakeholders learn from it, becoming experienced knowledge. Or what economists call tacit knowledge.
The bulk of the work is of course transitioning an organization from being data-inspired to becoming actually data-driven.
Formalization and Rigour
The first challenge for a data scientist is understanding the problem, and then rephrasing the problem into one that is more rigourous*While I find that many people struggle with the concept of rigour in the first place, that's a topic for another blog post. I personally find that the best way to understand the problem is go straight to the person who’s most experienced in the field and try to formalize as much tacit knowledge as possible.
The problem should be immediately apparent. Tacit knowledge is built upon experience. A data scientist would have to do these things to formalize a stakeholder’s tacit knowledge:
- Get the stakeholder to express what they think is happening.
- Get the stakeholder to express why their gut feeling works.
- Link up the whys from the stakeholder to formalized methods.
- Disentangle the various mixed effects that may lead to said gut feeling.
Feelings about data is really just that - feelings. It however doesn’t invalidate the experience, and in my opinion is extremely valuable. Afterall, as humans, we don’t go round consciously calculating probabilities of events happening. Rather, we feel, and we make conjectures based on experience.
Strictly speaking rigour isn’t really needed. It’s only needed you want to make rational decisions and play in a competitive field where all the players are highly rational. At that point, rigour in decision making isn’t just a competitive advantage, it’s a minimum requirement to play.
Mechanical Sympathy
Jackie Stewart is famous for saying “You don’t have to be an engineer to be be a racing driver, but you do have to have mechanical sympathy”. He posits that a driver that understands roughly what’s going on with the car is a better driver than one who doesn’t.
I think this is an apt analogy. A driver that knows well about the internals of his/her engine would more or less know when and how to push the vehicle to its maximum performance. The driver would know the limits of the engine, and knows what can and more importantly, cannot be done.
Data Sympathy
Likewise for data science. Statistics gives one a sorta mechanical sympathy over the data. Knowing how your data behaves under certain assusmptions is akin to having mechanical sympathy for data.
There is also the actual mechanical sympathy that data scientists usually have - an understanding how the data is collected, laid out in the database and leveraging the various data structures to perform work. You’d get a sense of knowing the mechanics of data really well. You’d know what can and cannot be done with the data.
This is a very mechanical view of data - a view that concerns itself with the hows, whys and whats in a very mechanical, rigourous sense.
Data Empathy
The hows, whys and whats of data can also be viewed in a more experiential sense. This is where the aforementioned tacit knowledge comes in play. One feels the data instead of understanding the data from a mechanical point of view. A good deal of domain knowledge informs the feeling of data.
I use the word “feel” judiciously and with much deliberation. A good example that I often reach out to is the notion of statistical significance. Some difference may be observed between two samples. A data-driven decision maker would take rigourous steps to prove that the difference is not caused by random chance or measurement errors. A data-inspired decision maker would simply make a decision based on a cursory analysis of the data driven mostly by tacit knowledge.
Another example that I would go to is the idea of trend-breaks. The task at hand is to figure out if a collection of data points breaks the trend. A data empathetic person would decide based on “feeling” of the data while a data sympathetic person, having a more mechanistic experience of data, would decide using more rigorous tools.
It doesn’t mean that data scientists do not have some form of data empathy though. Often I would get some data and it would be immediately apparent to me that the results don’t make sense. There isn’t really a rhyme or reason for that other than gut feeling. The difference, in my opinion is that being trained in thinking with more rigour, data sympathetic people can see more possibilities and restrictions around data.
The Clash
I find that a majority of clashing viewpoints between stakeholder and data scientist arise from these two differing views on data.
Stakeholders often cannot see a good reason to pursue further statistical inference or use of advanced techniques like machine learning. Or, being caught up by the latest hype, stakeholders would want to see machine learning applied on things that don’t make sense (like application of ML on really high level aggregate data).
Data scientists on the other hand often cannot see the point of “feeling” the data. Let’s face it, dashboards and descriptive statistics are dreadfully dull*that's why we have computers to do those things. Model building is exciting. Machine learning is exciting. We want to do more of that.
This difference of viewpoints leads to two different conclusions in the trajectories of thought. It’s typically materialized in two separate fields in industry: business intelligence and data science. Often stakeholders aren’t really aware of the differences between the two fields. In many companies, both BI and datascience are often rolled into one. I tend to avoid companies like those because it’s patently clear they have no idea what they’re doing.
I posit that it is actually useful to have shallow data empathy. The problem with tacit knowledge is that it’s not very transferable. It’s only really ever transferable by a lot of work teasing it out. Without having domain knowledge, most data science work would be tedious trudging through mud.
Stakeholder Management
Ultimately the big job in stakeholder management for a data scientist is to close the gap between these viewpoints. I also personally like to leverage as much domain knowledge as I can muster from stakeholders. But how to actually close the gap?
That’s something I’m not terribly good at and I’m trying to improve on this front. I’ve been pushing one end hard: evangelizing and educating on the finer points of rigour. I don’t think it’s that successful an approach.
In the various consulting gigs, or jobs, there always remains a group of stakeholders who will give lip service to data science but the approach doesn’t change. It could of course, just be the way I deliver the message.
Who knows. You may. Let me know what you think in the comments below!