Appropriateness of running a rank sum test on Likelihood to Recommend data

Question

I am working with Likelihood to Recommend metric that's measured on a scale of 1-10. The objective is to test for statistically significant improvement/decline (period over period) in the metric. On a side note, the distribution of this metric is heavily skewed towards higher ratings, especially 9-10 (dummy data below may not be reflective of that)

Here is what I have already tried in past:

Aggregate the data to use top-2 boxes (% of 9-10) and run a chi-square test of independence (proportions test).
Aggregate the data to use Net Promoter Score (% of 9-10 minus % of 1-6) and calculate Margin of Error.
Aggregate the data to use Net Promoter Score (% of 9-10 minus % of 1-6) and run a chi-square test to test for difference in the composition of NPS.

Those methods seem to work well but I am curious about the appropriateness of running a Wilcoxon Mann Whitney Rank Sum Test. The data is unpaired and sample sizes are large(>1000). What are your thoughts?

So you want to test whether the mean (?) is higher in Jan-18 compared to Jan-17? Or do you want to do a $\chi^2$ test to check whether the distributions are similar (for all the classes)? This is important, which score(s) are you really interested in? The higher ones? All of them? — user2974951, Jan 23 '19 at 09:26
@user2974951 Sorry I wasn't clear enough. I made some minor edits to the question. My objective is to see if customer experience improved/declined. Customer experience would be considered to have improved if there is an increase in %promoters(9-10) or a decrease in %detractors (1-6). I want to get an overall idea of whether the metric improved or if it declined period over period. — Phil Coulson, Jan 23 '19 at 17:02
You need to decide what you want to test, do you want to test the global mean, to see if customer experience changes (that is use all available information)? Do you want to use a chi square test to test the distribution of all the ratings (not recommended, too large sample) or maybe an aggregate of these (say 1-6, 7-8 and 9-10)? Or maybe you are only interested in the proportion of say 9's and 10's, in which case you can use a proportion test? How many periods do you have? — user2974951, Jan 24 '19 at 13:12
@user2974951I have already tried proportions test and chi-square test in the past. I am curious if it is appropriate to use WMW Rank Sum Test on such data? — Phil Coulson, Jan 24 '19 at 19:29
Yes... it is appropriate but it may not be sensible, so why would you? The Mann Whitney test is a non-parametric version of the t-test when all the assumptions are not, which means it can be used in more situations compared to a t-test, but the negative side is that it has lower power, that is it is less likely to detect a difference if it exists. So if you are fishing for significance you are less likely to find it with Mann Whitney. — user2974951, Jan 25 '19 at 14:00

Appropriateness of running a rank sum test on Likelihood to Recommend data

0 Answers0