How do you report a Mann–Whitney test?

Question

I am doing my dissertation, and I am conducting a number of tests. After using a Kruskal–Wallis test, I usually report the result like this:

There is a significant difference $(\chi^2_{(2)}=7.448, p=.024)$ between the means of...

But now I conducted a Mann–Whitney test, and I am not sure which values to present. SPSS gives me a Mann–Whitney $U$, Wilcoxon $W$, $Z$ and $P$-value. Do I present all these 4 values? Or are some irrelevant?

Nick Stauner · Accepted Answer · 2018-05-17T19:33:30.070

12

Wikipedia appears to have your answers. Here's an excerpt from the example statement of results:

In reporting the results of a Mann–Whitney test, it is important to state:

A measure of the central tendencies of the two groups (means or medians; since the Mann–Whitney is an ordinal test, medians are usually recommended)

The value of U

The sample sizes

The significance level.

In practice some of this information may already have been supplied and common sense should be used in deciding whether to repeat it. A typical report might run,

"Median latencies in groups E and C were 153 and 247 ms; the distributions in the two groups differed significantly (Mann–Whitney U = 10.5, n₁ = n₂ = 8, P < 0.05 two-tailed)."

The Wilcoxon signed-rank test is appropriate for paired samples, whereas the Mann–Whitney test assumes independent samples. However, according to Field ⁽²⁰⁰⁰⁾, the Wilcoxon $W$ in your SPSS output is "a different version of this statistic, which can be converted into a Z score and can, therefore, be compared against critical values of the normal distribution." That explains your $z$ score too then!

FYI, Wikipedia adds that, for large samples, $U$ is approximately normally distributed. Given all these values, one can also calculate the effect size $η^2$, which in the case of Wikipedia's example is 0.319 (a calculator is implemented in section 11 here). However, this transformation of the test statistic depends on the approximate normality of $U$, so it might be inaccurate with ns = 8 ^{(Fritz et al., 2012)}.

P.S. The Kruskal–Wallis test's results should not be interpreted as revealing differences between means except under special circumstances. See @Glen_b's answer to another question, "Difference Between ANOVA and Kruskal-Wallis test" for details.

References

^{Field, A. (2000). 3.1. Mann-Whitney test. Research Methods 1: SPSS for Windows part 3: Nonparametric tests. Retrieved from http://www.statisticshell.com/docs/nonparametric.pdf.

Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1), 2–18. PDF available via ResearchGate.}

edited May 17 '18 at 19:33

answered Feb 21 '14 at 10:49

Nick Stauner

11,558
5
47
105

3

What is the point of reporting the value of U in the example above? What do I, as reader, gain from knowing that U was 10.5? – amoeba Feb 21 '14 at 10:59
4

In the example above, you gain the ability to calculate the exact $p$, which is not given and may be useful for effect size estimation, meta-analysis, or checking for $p$-hacking. A friend and colleague of mine @rpierce has also advised me to report test statistics to ensure readers that I'm doing things properly in general, as he's caught many published articles [doing it wrong](http://knowyourmeme.com/memes/youre-doing-it-wrong) via misreported test statistics and associated $df$. – Nick Stauner Feb 21 '14 at 11:15
Interesting. I guess this issue might be worthy of a separate question, which I might ask here at some point. Still: if one wants exact p-values, then one can report exact p-values! In fact, the usual advice is to report exact p-values, unless they are very small, like p<0.0001; but in this case p-hacking is unlikely. And effect size should be reported separately anyway, like e.g. "median latencies in groups E and C were 153 and 247 ms" in your quote from wiki. – amoeba Feb 21 '14 at 11:21
I fully agree that effect sizes should be reported, and generally would've agreed with the usual advice on exact $p$ values...but @rpierce has argued that this encourages readers to misinterpret $p$ values in all the myriad ways they do, and to use $p$ as a proxy for effect size instead of A) demanding the real deal or even B) using it when it's there. Some of *that* separate question has been discussed in [my answer here](http://stats.stackexchange.com/a/79158/32036) and our comments, but seems far from settled...Regardless, his point about error-checking persuades me to report my test stats. – Nick Stauner Feb 21 '14 at 11:33
Hmmm. Is his point about error-checking expressed somewhere online or in print? I am curios about examples of articles that he caught. – amoeba Feb 21 '14 at 11:45
1

Secret (i.e., non-searchable) Facebook group for our alma mater's psych grads; no examples included. I'm curious too though. Maybe you can get him to share some if you tag him in a separate question :) I bet others here would have plenty of examples to chime in with too – that is, if the question doesn't get closed down as somehow off-topic! Certainly a more basic question like, "Why is it important to report test statistics in addition to $p$ and effect size?" would be on-topic though, I think...Check for duplicates under the [tag:reporting] tag first though, if you really want to be safe... – Nick Stauner Feb 21 '14 at 11:47
1

Where the sample sizes are not so small that it will be misinterpreted, I'd lean toward reporting a standardized U or W (standardized, they're identical) as a Z-value ($Z_U$ say), because readers will have an intuitive sense of what that means -- though then it becomes necessary to be clear when you have the exact p-value if you do, not one based off the Z score for the statistic. – Glen_b Feb 21 '14 at 13:47
@Glen_b: but doesn't p-value already provide readers with an intuitive sense, so much so that the value of statistic itself becomes irrelevant? Especially when the statistic is such that only a vary rare reader will have an intuitive feeling about its values (e.g. U). – amoeba Feb 21 '14 at 13:58
In which case, why ever report any test statistic but a p-value? For some people a p-value is fine - but I've found that for many, giving something interpretable as a Z or a t, even if only approximately, conveys a better understanding. – Glen_b Feb 21 '14 at 14:01
amoeba, be careful that you don't feed @rpierce further ammo for shooting down exact $p$ values by appealing to the accuracy of stats consumers' intuitive senses about $p$ values ;) That's an uphill battle! – Nick Stauner Feb 21 '14 at 14:02
Thank you for all your help. I decided to leave all the values in the tables but to include only (U=123, p=.001) in my discussion. – dissertationhelp Feb 21 '14 at 16:29
Another helpful clarification raised in [a new comment by Glen_b](http://stats.stackexchange.com/questions/97737/mann-whitney-u-test-analysis-help/97739?noredirect=1#comment190390_97739)... – Nick Stauner May 14 '14 at 23:33
@amoeba: I found another point of reporting $U$ (and the *n*s) in large samples: effect size calculation! – Nick Stauner May 17 '18 at 19:35

How do you report a Mann–Whitney test?

1 Answers1

Linked