4

I have noticed an inconsistency in the formula for the Mann-Whitney U test. Sites like wikipedia often report it as $$U_1={\frac{n_1(n_1+1)}2}-R_1$$ However, the original paper by Mann and Whitney reports it as $$U_1=n_1n_2+{\frac{n_1(n_1+1)}2}-R_1$$ What difference does $n_1n_2$ make?

Or are these different tests?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • It's all the same test. Sometimes people shift, scale, or otherwise transform the test statistic. What matters in a statistical test is the P-value as a function of the data. The test statistic is just an intermediate value. Sometimes the convention for a statistic is universal, sometimes less so. – David Wright May 16 '17 at 21:03
  • See 1. [Mann-Whitney U Statistic seems very large - is something wrong?](https://stats.stackexchange.com/questions/252976/mann-whitney-u-statistic-seems-very-large-is-something-wrong) 2. [Why are there two forms for the Mann-Whitney U test statistic?](https://stats.stackexchange.com/questions/122985/why-are-there-two-forms-for-the-mann-whitney-u-test-statistic) 3. [Wilcoxon rank sum test in R](https://stats.stackexchange.com/questions/65844/wilcoxon-rank-sum-test-in-r) – Glen_b May 17 '17 at 00:10

1 Answers1

3

It's the same test, but you're actually reading it wrong. Wikipedia defines $U_1$ as:

$$ U_1 = R_1 - \frac{n_1(n_1+1)}{2} $$

And, using the same notation, the Mann-Whitney paper defines $U_1$ as:

$$ U_1 = n_1n_2 + \frac{n_2(n_2+1)}{2}-R_2 $$

Note that aside from the $n_1n_2$ piece, the rest of the definition of $U_1$ is actually in terms of $n_2$ and $R_2$ ($m$ and $T$ in the paper). Actually you can do some rearranging to get $U_1$ directly in terms of $U_2$:

$$ U_1 = n_1n_2 - (R_2-\frac{n_2(n_2+1)}{2}) $$

The bracketed term is of course just $U_2$, so:

$$ U_1 = n_1n_2 - U_2 $$

You can see that this is true by considering the fact that the sum of all ranks is just $\frac{(n_1+n_2)(n_1+n_2+1)}{2}$, so:

$$ R_1+R_2=\frac{(n_1+n_2)(n_1+n_2+1)}{2} $$

Putting $R_1$ and $R_2$ in terms of $U_1$ and $n_1$ and $U_2$ and $n_2$ yields:

$$ U_1+\frac{(n_1)(n_1+1)}{2}+U_1+\frac{(n_2)(n_2+1)}{2}=\frac{(n_1+n_2)(n_1+n_2+1)}{2} $$

Then you can do some algebra and see the relationship between $U_1$ and $U_2$:

$$U_1=n_1n_2-U_2$$

Peyton
  • 317
  • 2
  • 6