In the business world, the strength of a metric lies in — not only how accurate it is — but also the extent to which it helps drives change and action. The world of customer experience is no exception, where the metrics discussion has become a tried and true pastime.
But there is no perfect metric. So instead of endlessly searching for one, we should start by considering the circumstances affecting our business, and the goals we want to achieve. Once we’re armed with that perspective, we can make an informed choice about which metrics to use, and make sure the metrics themselves are not getting in our way.
This was highlighted by a recent discovery made by our research team when we were asked the question:
From a statistical standpoint, which will do a better job of revealing insights: a mean score, NPS or Top2Box?
The answer was: it depends.
What we found was that, for companies with consistently high scores, all three metrics perform largely the same in terms of how well they detect meaningful differences or changes in scores (on 0-10 and 1-10 scales). In many ways, this consistency becomes a virtuous cycle for high performers: companies with good customer experience can use any one of the metrics to look for insights and, if there are significant differences to be found, they are likely to uncover them. The choice of metric has few consequences at the higher end of the spectrum, which means that there is little chance that any of the metrics will miss important insights.
But the same can’t be said for companies with low scores — those just starting their improvement efforts or in industries that tend to score at the lower end of the continuum. When companies have lower CX performance and face improvement challenges, our analyses suggest that statistical tests using mean scores can miss important distinctions, classifying relevant insights as insignificant. Instead, comparing NPS or Top2Box when scores are generally low can reveal significant differences that would be missed by a comparison of simple averages.
Here’s an example:
Suppose a company in telecom — an industry that generally lags behind in customer experience — thinks that its older customers typically rate their experience more positively, on average, than its younger customers. If this is truly the case, then a better understanding of the factors driving those different experiences could yield actionable insights.
But were the two groups really responding differently? To determine this, the telecom tested 2,500 responses from each age group to see whether the difference in average likelihood to recommend was statistically significant. The mean scores were 6.25 and 6.36 and a statistical test comparing the two suggested the answer was no — the difference was within the range that would be expected due to chance alone. So, given the need to prioritize, the team decided not to investigate further — they assumed that younger and older customers had basically the same customer experience and that the issue warranted no further attention.
Our research shows that comparing NPS or Top2Box scores could have led to the very different conclusion that younger and older customers were, in fact, reporting different experiences. Using a real dataset and running a series of Monte Carlo simulations, we discovered that a mean calculation in this scenario finds a statistically significant result only 22% of the time — whereas a Top2Box or NPS calculation does so between 74% and 82% of the time!1 The next step would have been to investigate the root causes of the difference and leverage those insights to drive improvement.
Why the differences? Because calculating the mean gives a lot of weight to score changes at the low end — a customer moving from a 0 to a 6 impacts the mean more than a movement from a 6 to a 10 does. The variability this creates can hide meaningful changes at the middle and upper end, where a majority of responses usually fall. As illustrated below, we do not find the same differences for companies with higher scores. For companies with consistently high scores, all three metrics perform largely the same in detecting meaningful differences or changes in scores. There are no dramatic power differences.2
Significant results beget significant investigations. Given this, the seemingly innocuous choice of a primary metric could be the difference between inaction and impactful operational improvement.
The bottom line? Make sure your metrics aren’t hiding valuable information, but then get back to the question that really matters: given your circumstances and goals, what actions are you going to take with the insights you discover?Photo credit: David Goehring 1If we drop our confidence level to 90%, a mean score would have a 31% detection rate while NPS and Top2Box would have between 83% and 90%. 2In both cases, it’s also important to remember that sample size is king: with enough responses, any metric will be able to detect differences (if they exist) and with too few responses, no metric will tell you what you need to know.