Measuring Success as a Data Scientist

By: Fabio Giraldo, Associate Director, Advanced Analytics, Mindshare North America

By: Fabio Giraldo, Associate Director, Advanced Analytics, Mindshare North America

In our industry, data scientists use data to build predictive models to not only understand how media impacts user behavior, but to use these models to optimize media and marketing activity in a way that drives success.

Thus, while many people believe that a good model is one which is accurate at predicting the future, the question that people really should ask themselves is: is this model good for achieving my goals? If your only goal is to accurately forecast a KPI, then sure, your answer would be yes. But if not, then there’s other things that you need to take into account. At yesterday’s Advertising Week session, “Measuring Success as a Data Scientist,” the panelists talked about this issue at length.

In one example, they pointed to the health industry example, specifically regarding a rare disease that happens in a tiny fraction of the population (e.g., 0.1% of the population). If you created a model that accurately predicts the occurrence of such a disease and could forecast that it will happen in 0.00% of the population, well, that still be accurate but ultimately of no real use to anyone. What is more useful is to identify on a case-by-case basis if a person will  or will not get the disease, and building such a model would necessarily imply loss in predictive accuracy because of the complexities in modelling.

Now, putting this into the context of marketing, in reality most brands are not just interested in having a model that accurately predicts a KPI, but actually using a model to drive decision-making. Some of those decisions involve whether or not to advertise to a specific audience, relying on models that suggest if a group of people is part of the target audience or not, as described in the following table:

The blue section on the top left (true positives) represent the people that the model correctly suggests are part of the target audience. Similarly, the blue section at the lower right (true negatives) represent the people that the model correctly suggests aren’t part of the target audience.

What is important out of this is what happens with the red boxes. False positives are the people that the model suggests are part of your target audience, but they aren’t, and false negatives are the people that the model suggests are not the target audience but they actually are. False negatives involve an opportunity cost of not serving ads to an audience that have fair chances to convert, while the cost of false positives derives from the waste of ads that are being served to people with low chances to convert.

What is optimal for one brand is most likely suboptimal for another. Measuring the opportunity cost and the cost of waste for a specific brand will provide guidance on how to build models that reduce the overall loss by balancing these two costs. Once again, this will most likely involve a reduction in predictive accuracy, but more importantly will help a brand achieve its goals and be more efficient in the process.