zen stones

A competent data scientist can make models and plots. A great data scientist can also help make decisions.

Making a decision requires confidence. Uncertainty is the enemy of confidence. Uncertainty hurts. Like a splinter under a fingernail, it makes us squirm. Feeling unsure what to expect or what to do keeps us awake nights.

Recommending an action to your CEO takes the pain of uncertainty away from them. Giving a simple prediction to your customers gives them the confidence they need to decide their next move. The way to give confidence is to oversimplify.

Oversimplifying isn't complicated, but it takes care. Here's how:

  1. Remove qualifying words

    Take out language that weakens your message. Use words like “probably”, “likely” and “in my opinion” sparingly if at all. Commit to your message. You may end up being wrong. Take a deep breath, embrace that possibility and be bold.
  2. Shorten it

    Take out unnecessary technical terms, acronyms, references and tangents. For the ones that you keep, explain them carefully in plain, non-technical language, as you would to your 13 year-old niece.
  3. Shorten it again

    Keep only the bare minimum of information you need to make your point. Take out all the parts that serve only to show how hard you worked and how clever you are. If it hurts too much to delete them, move them to another document and leave a link.
  4. Shorten it more

    Add a summary at the top. Make it no more than three sentences. One sentence is best. Include a graphic if it makes your point. Include an action recommendation.

All answers are wrong, but some are useful.

--apologies to George Box

Following these steps will inevitably result in an oversimplified document. Subtleties will be neglected. Caveats and corner cases will be omitted. What ifs will be left out. That's OK.

Reducing uncertainty is a tricky business. The closer you look at something the less certain it becomes. The scaffolding of assumptions that any analysis is built on is subject to questioning, probing and deconstruction. Asking questions doesn't lead to answers, it only leads to more questions.

Take the nature of matter, for instance. Investigation resulted in finer and finer subdivisions and classifications. At one point atoms were believed to be the fundamental building blocks of which everything else is composed. Confidence was high. However, additional questioning has taken away that simplicity. Particle physics, quantum mechanics and the limitations of our experimental apparatuses leave us with profound questions about not just matter, but also about the universe and the nature of reality.

Industry is different than academia

Academia and industry have different goals. They are two worlds with different languages and currencies. The currency of academia is reputation, which you lose by being wrong. The currency of industry is currency, which you get by making decisions quickly and with conviction. Industry is the world of Gryffindor rather than Ravenclaw, more Kirk than Spock.

When a company officer asks for an analysis, they don't usually care about the answer. What they are really asking for is an answer to the question "What should I do?” The data scientist who is capable of bridging the gap from raw data to recommending a course of action is a rare asset. Recommending action is interpreted as a sign of leadership, and tends to be rewarded with raises and promotions. It shows that you are looking past your 37-inch monitor, to the well-being and future of the company. This is deeply reassuring to company leaders, and highly valued.

Our audiences crave certainty

Our aversion to uncertainty can be seen in weather reports. Predicting tomorrow's high temperature is done by running an ensemble of models. The result is actually a distribution. There may be a 10 degree difference between the highest estimate and the lowest. It would be more accurate for the weather person to express tomorrow's weather as “The expected value for tomorrow's high temperature is 73 degrees, with a bell shaped distribution that has a variance of 5 degrees, skewed low with a very long tail on the high side.” A prediction of this sort would probably not be well received. Not knowing exactly what the temperature will be is frustrating.

The biggest value a data scientist can provide is to simplify. Your audience wants a single number to base decisions on. It's OK to be a bit wrong as long as you keep it simple. In fact, your audience will appreciate it doubly if you take the extra step and help them decide what to do. Don't just say “It's going to be 73 degrees tomorrow.” Instead say "It's going to be 73 degrees tomorrow, so if you were thinking about heading down to the beach, grab your picnic basket and surfboard!" If you can take away your manager’s or her manager’s uncertainty when making a decision, they will appreciate it to their bones.

Know your audience

Different audiences are uncertain about different things and want different levels of detail. There is an art in reading your audience and knowing what they want certainty about. Your CEO may be completely uninterested in technical detail. However, a project lead will be more interested the details of your method, but still perhaps uninterested in the approaches that you tried that were unsuccessful.

In the world of sales, one is often encouraged to find customers’ "pain points": What is it that makes a customer's life hard? That is an opportunity. If I can somehow make that part of their life easier, I provide them value. Knowing that uncertainty is a universal pain point is very valuable indeed. Use what you know about your customers, whether they be subscribers or company leaders, to figure out which flavors of uncertainty bother them the most. Anything you can do to reduce that uncertainty will get their attention and gratitude.

Discomfort with uncertainty doesn't appear to be related to intelligence or culture; it seems to be fundamentally unsettling to human beings. Some are more upset by it then others, and people are comfortable with uncertainty in different areas, but it appears to be a basic source of psychological pain. As a data scientist you can soothe that pain with a bit of oversimplification.

Thanks to Greg Antell for the coffee shop conversation that inspired this post.