27

I'll let Wikipedia explain how NPS is calculated:

The Net Promoter Score is obtained by asking customers a single question on a 0 to 10 rating scale, where 10 is "extremely likely" and 0 is "not at all likely": "How likely is it that you would recommend our company to a friend or colleague?" Based on their responses, customers are categorized into one of three groups: Promoters (9–10 rating), Passives (7–8 rating), and Detractors (0–6 rating). The percentage of Detractors is then subtracted from the percentage of Promoters to obtain a Net Promoter score (NPS). NPS can be as low as -100 (everybody is a detractor) or as high as +100 (everybody is a promoter).

We have been running this survey periodically for several years. We get several hundred responses each time. The resulting score has varied by 20-30 points over the course of time. I'm trying to figure out which score movements are significant, if any.

If that simply proves too difficult, I'm also interested in trying to figure out the margin of error on the basics of the calculation. What's the margin of error of each "bucket" (promoter, passive, detractor)? Maybe even, what's the margin of error if I just look at the mean of the scores, reducing the data to just one number per survey run? Would that get me anywhere?

Any ideas here are helpful. Except "don't use NPS." That decision is outside my ability to change!

whuber
  • 281,159
  • 54
  • 637
  • 1,101
Dan Dunn
  • 371
  • 1
  • 4
  • 5

3 Answers3

40

Suppose the population, from which we assume you are sampling randomly, contains proportions $p_1$ of promoters, $p_0$ of passives, and $p_{-1}$ of detractors, with $p_1+p_0+p_{-1}=1$. To model the NPS, imagine filling a large hat with a huge number of tickets (one for each member of your population) labeled $+1$ for promoters, $0$ for passives, and $-1$ for detractors, in the given proportions, and then drawing $n$ of them at random. The sample NPS is the average value on the tickets that were drawn. The true NPS is computed as the average value of all the tickets in the hat: it is the expected value (or expectation) of the hat.

A good estimator of the true NPS is the sample NPS. The sample NPS also has an expectation. It can be considered to be the average of all the possible sample NPS's. This expectation happens to equal the true NPS. The standard error of the sample NPS is a measure of how much the sample NPS's typically vary between one random sample and another. Fortunately, we do not have to compute all possible samples to find the SE: it can be found more simply by computing the standard deviation of the tickets in the hat and dividing by $\sqrt{n}$. (A small adjustment can be made when the sample is an appreciable proportion of the population, but that's not likely to be needed here.)

For example, consider a population of $p_1=1/2$ promoters, $p_0=1/3$ passives, and $p_{-1}=1/6$ detractors. The true NPS is

$$\mbox{NPS} = 1\times 1/2 + 0\times 1/3 + -1\times 1/6 = 1/3.$$

The variance is therefore

$$\eqalign{ \mbox{Var(NPS)} &= (1-\mbox{NPS})^2\times p_1 + (0-\mbox{NPS})^2\times p_0 + (-1-\mbox{NPS})^2\times p_{-1}\\ &=(1-1/3)^2\times 1/2 + (0-1/3)^2\times 1/3 + (-1-1/3)^2\times 1/6 \\ &= 5/9. }$$

The standard deviation is the square root of this, about equal to $0.75.$

In a sample of, say, $324$, you would therefore expect to observe an NPS around $1/3 = 33$% with a standard error of $0.75/\sqrt{324}=$ about $4.1$%.

You don't, in fact, know the standard deviation of the tickets in the hat, so you estimate it by using the standard deviation of your sample instead. When divided by the square root of the sample size, it estimates the standard error of the NPS: this estimate is the margin of error (MoE).

Provided you observe substantial numbers of each type of customer (typically, about 5 or more of each will do), the distribution of the sample NPS will be close to Normal. This implies you can interpret the MoE in the usual ways. In particular, about 2/3 of the time the sample NPS will lie within one MoE of the true NPS and about 19/20 of the time (95%) the sample NPS will lie within two MoEs of the true NPS. In the example, if the margin of error really were 4.1%, we would have 95% confidence that the survey result (the sample NPS) is within 8.2% of the population NPS.

Each survey will have its own margin of error. To compare two such results you need to account for the possibility of error in each. When survey sizes are about the same, the standard error of their difference can be found by a Pythagorean theorem: take the square root of the sum of their squares. For instance, if one year the MoE is 4.1% and another year the MoE is 3.5%, then roughly figure a margin of error around $\sqrt{3.5^2+4.1^2}$ = 5.4% for the difference in those two results. In this case, you can conclude with 95% confidence that the population NPS changed from one survey to the next provided the difference in the two survey results is 10.8% or greater.

When comparing many survey results over time, more sophisticated methods can help, because you have to cope with many separate margins of error. When the margins of error are all pretty similar, a crude rule of thumb is to consider a change of three or more MoEs as "significant." In this example, if the MoEs hover around 4%, then a change of around 12% or larger over a period of several surveys ought to get your attention and smaller changes could validly be dismissed as survey error. Regardless, the analysis and rules of thumb provided here usually provide a good start when thinking about what the differences among the surveys might mean.

Note that you cannot compute the margin of error from the observed NPS alone: it depends on the observed numbers of each of the three types of respondents. For example, if almost everybody is a "passive," the survey NPS will be near $0$ with a tiny margin of error. If the population is polarized equally between promoters and detractors, the survey NPS will still be near $0$ but will have the largest possible margin of error (equal to $1/\sqrt{n}$ in a sample of $n$ people).

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 1
    This was a fantastic answer. I greatly appreciate it. – Dan Dunn Dec 01 '11 at 02:33
  • Fantastic answer: thorough, clear, and simple. Thank you so much! –  Jan 24 '12 at 10:12
  • 1
    Isn't "margin of error" commonly interpreted as the 95% confidence interval for a statistic drawn from a sample? ie roughly 1.96 the sampling standard error (or standard deviation) of that statistic. You use margin of error as synonymous with "standard deviation of the statistic" or "standard error". – Peter Ellis Jan 24 '12 at 18:29
  • Yes, @Peter, your interpretation correctly reflects my intended meaning. I am not aware of a universal convention for "margin of error" and therefore took some care to define *my* meaning so that I could be clear. If you could direct me to some evidence of a convention I would be glad to learn from it and modify the definition given here to reflect that. Fortunately, the nature of this reply would not change and the calculations would remain the same: only the interpretation would need to be adjusted to match the convention. – whuber Jan 24 '12 at 19:05
  • 1
    Thanks @whuber. I try never to argue about terminology so long as it is clearly defined (the Humpty Dumpty principle), and I think the horse has bolted on a consistent convention on this one. The only evidence I have is an answer to my own question at http://stats.stackexchange.com/questions/21139/correct-terminology-for-describing-relative-confidence-interval, which correctly notes that margin of error is commonly (not universally) quoted as a percentage of the estimate. – Peter Ellis Jan 24 '12 at 19:31
  • 3
    @Charles, I think whuber is doing a basic variance of a discrete random variable. See http://www.stat.yale.edu/Courses/1997-98/101/rvmnvar.htm – B_Miner Feb 22 '12 at 20:33
  • @whuber I could be completely wrong, but I have heard that using the calculations for the mean don't work, because p1, p0, and p-1, are not independently distributed. More p1s means fewer p0s and/or fewer p-1s. – Jonathan Apr 10 '12 at 19:01
  • I don't follow, @Jonathan. $p_{-1}$, $p_0$, and $p_1$ are not random variables. They are proportions of a population: that is, *numbers*. They don't have distributions and it makes no sense to refer to them as "independent." I'll try to clarify the opening remarks in this reply. – whuber Apr 10 '12 at 19:06
  • I did notice 1 problem at small sample sizes. I was comparing my calculations with Whubers using actual data and was getting slightly different results which had me a little confused. Until I realised that the SD used by Whuber is the population SD, while the one I was using was the sample one. Not much of a problem at large sample sizes. So by all means use Whubers calculations as a guide for the expected SE of NPS. However if your sample sizes are small than just be aware that the actual SE will be larger than that calculated using Whubers formula above. –  Aug 29 '12 at 08:58
  • @Chris Thank you for your comment. However, you misunderstand my calculation: it correctly evaluates the population SD. Concerning estimation, I finesse the issue you bring up, by stating "you estimate it by using the standard deviation of your sample instead." If you feel that a correction is needed in your estimate of the SD--which is the conventional way to do it--then you are free to apply it. Note, though, that even with the conventional correction factor the sample SD is still a biased estimate of the population SD. – whuber Aug 29 '12 at 11:57
  • @Whuber : Could you please explain the following part? "Provided you observe substantial numbers of each type of customer (typically, about 5 or more of each will do), the distribution of the sample NPS will be close to Normal. " I am not able to get how it follows that the distribution of sample NPS will be close to normal. – Stats IT Mar 11 '13 at 14:24
  • @Nilotpal This is a consequence of the Normal approximation to the Binomial Distribution. (The sample NPS is a linear combination of two negatively correlated Binomial variables.) – whuber Mar 11 '13 at 15:02
  • @whuber: Thanks, I got the normal approximation to binomial distribution. I have two doubts on this. First, in a negatively correlated binomial distribution, sum the number of one outcome and that of the alternate outcome is equal to the number of experiments. In case of the sum of the number of promoters and detractors is not equal to the number of experiments because this also depends on the number of passives. So I would like to know how in case of NPS we are justified to assume the binomial distribution. – Stats IT Mar 12 '13 at 02:02
  • @Nilotpal You are incorrect about your characterization of negatively correlated Binomials. In this case, the two random variables are the number of positives and the number of negatives. That they have Binomial distributions follows directly from the definition; that they are correlated follows from the linear relation among them and the number of neutrals. The overall distribution of the three counts (positive, negative, neutral) is multinomial: from this we may compute the correlation and justify the Normal approximation rigorously. – whuber Mar 12 '13 at 02:38
  • A reader (without sufficient reputation to comment) writes, "If any two of the number of detractors, passives or promoters are zero, no matter how big the third variable is, the Margin of Error is zero. Would you expect this?" I would first appeal to caution: the principal conclusions in my post were based explicitly on the assumption "Provided you observe substantial numbers of each type of customer (typically, about 5 or more of each will do) ... ." – whuber Sep 19 '14 at 13:45
  • (Continued) Clearly this assumption is violated in this situation with zero counts. When any *one* of those three numbers is close to zero, caution is needed in making the interpretation. You're in a simple situation when the other numbers are large, because that means the near-zero count truly is rare; but when the other numbers are themselves small, then you can't be sure whether the near-zero counts reflect a rare type of customer or are just the result of sampling variability. More sophisticated methods (but still based on the same concepts) exist to quantify that variability. – whuber Sep 19 '14 at 13:46
  • @whuber Why aren't the total number of customers factored in this calculation? For example, if I have 500 customers and all of them complete a survey (error should be 0), won't the margin of error be different than if I have 100 respondents out of 500 total customers? – Aaron Brager Jan 29 '15 at 20:48
  • @Aaron I'm not sure what you are saying, but I can point out that (1) the margin of error depends only on the number of customers *in the sample*, $n$, and (2) the value of $n$ is explicitly used in this answer: the margin of error is inversely proportional to its square root. Separate from this, it might be worth noting that if you survey 500 customers and only 100 respond then your results are practically worthless due to the possibility (really, the near-certainty) of non-response bias. – whuber Jan 29 '15 at 20:53
  • @whuber Sorry, 100/500 was a bad example. Your statement (1) clarifies things but I don't understand *why*. Let *n* be the sample size and *c* the total number of customers. Why is the relation between *c* and *n* irrelevant? Shouldn't the margin of error decrease as *n* approaches *c*, since you are sampling a higher percentage of the total population? – Aaron Brager Jan 29 '15 at 21:25
  • @Aaron The purpose usually is not to draw inferences about the customers you have; it is to draw inferences about *your customer experience.* That "population" is hypothetically infinite. Your sample of $n$ represents that population. In cases when the target of inference truly is a finite, well-defined population then indeed you must [adjust the margin of error](http://en.wikipedia.org/wiki/Standard_error#Correction_for_finite_population) downward as the sample size approaches the population size. For NPS, the adjustment is negligible until the sample exceeds 10% of the population. – whuber Jan 29 '15 at 21:36
  • Hello, this probably stupid but here's my question: when you say "95% confidence that the survey result (the sample NPS) is within 8.2% of the population NPS", do you mean that our NPS (33%) is within the [33-4.1 - 33+4.1] range, or the [33*(1-4.1%) - 33*(1+4.1%)] range ? Do we add the % or do we multiply them ? – rom_j Nov 10 '15 at 13:52
  • 1
    @rom_j "...lie within two MoEs of the true NPS..." means the standard errors are *added* rather than multiplied. – whuber Nov 10 '15 at 14:08
  • 2
    The expression for the variance can be simplified to $Var = p_1 + p_{-1} - NPS^2$. – Stephen McAteer Jun 15 '16 at 22:22
  • @Stephen Thank you for that observation. I hope you won't mind if I don't incorporate it in the answer, though: I intentionally left the expression unsimplified so that (a) algebraic issues would not interrupt the exposition and (b) the calculation would reflect the underlying definition. – whuber Jun 15 '16 at 22:24
  • @whuber No worries at all! – Stephen McAteer Jul 19 '16 at 01:37
  • @whuber: what should i do when the group sizes differ substantially in size to be able calculate the 'pooled' margin of error to compare the difference in nps between the 2 groups? Because you specificically remark that they are roughly equal in size. – Sander Aug 26 '21 at 13:06
3

You could also use the variance estimator for continuous variables. Actually, I'd prefer it over the variance estimator for the random discrete variable, since there is a well-known correction for calculating the sample variance: https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation As others noted, Whubers solution is based on population formulae. However, since you are running a survey, I'm pretty sure you've drawn a sample, so I would recommend using the unbiased estimator (dividing the sum of squares by n-1, not only by n). Of course, for large sample sizes, the difference between the biased and unbiased estimator is virtually non-existent.

I'd also recommend to use a t-test procedure, if you have medium sample sizes, instead of using the z-score approach: https://en.wikipedia.org/wiki/Student's_t-test

@whuber: since others have asked it too: how would one calculate the unbiased sample estimator for variance/sd for your random discrete variable approach? I've tried to find it on my own, but weren't successful. Thanks.

deschen
  • 479
  • 3
  • 12
1

You can potentially use bootstrap to simplify your calculations. In R the code would be:

library(bootstrap)

NPS=function(x){
  if(sum(!x%%1==0)>0){stop("Non-integers found in the scores.")}
  if(sum(x>10|x<0)>0){stop("Scores not on scale of 0 to 10.")}
  sum(ifelse(x<7,-1,ifelse(x>8,1,0)))/length(x)*100
}

NPSconfInt=function(x,confidence=.9,iterations=10000){
  quantile(bootstrap(x,iterations,NPS)$thetastar,c((1-confidence)/2, 1-(1-confidence)/2))
}


npsData=c(1,5,6,8,9,7,0,10,7,8,
          6,5,7,8,2,8,10,9,8,7,0,10)    # Supply NPS data
hist(npsData,breaks=11)                 # Histogram of NPS responses

NPS(npsData)            # Calculate NPS (evaluates to -14)
NPSconfInt(npsData,.7)  # 70% confidence interval (evaluates to approx. -32 to 5)
k-zar
  • 305
  • 1
  • 10
  • Could you expand on your answer by explaining at the start what the approach is -- in sufficient detail that someone who doesn't understand your R code at all could still follow what you're trying to say -- and hopefully enough that they could take a stab at implementing it in their favourite language? – Glen_b Oct 07 '16 at 07:08