Efficient calculation of critical values for Mann-Whitney-Wilcoxon

Question

Most tables of critical values for the Mann-Whitney-Wilcoxon rank sum test statistic, usually known as U, are only calculated for very small samples. Presumably, this is because the test is most commonly recommended for non-normally distributed numeric data in samples too small for the central limit theorem to kick in. However, there may be other reasons for using a non-parametric test, for example when working with ordinal data. For this reason, it would be convenient to have tables of critical values for larger samples.

Calculating critical values for U is very easy in R, using the qwilcox function. However, I find that this function becomes unusably slow when working with group sizes greater than about 250. I presume that this is because of the recursive algorithm used in the underlying C code.

What would be an efficient way of calculating critical values for U?

score 4 · Answer 1 · answered Dec 01 '21 at 22:45

4

As I understand it, there is no closed-form CDF (nor CDF^–1) for the Wilcoxon-Mann-Whitney, and $p$ values or critical values must be calculated from a combinatoric recursion function of the sample sizes of the two groups. Combinatoric computations get big fast (note the difference in computation demand of $20!$ versus $200!$. While sometimes one can improve computing efficiency using $e(\ln())$ transformations of things like $\Gamma()$ and $\cdot!$, there's no getting around the fact that exact probabilities/critical values in rank sum distributions require a hairy bit of work that only grows nastily in computational demand with sample size.

Now I'ma scoot before someone who knows this better comes along to answer more completely.

answered Dec 01 '21 at 22:45

Alexis

26,219
5
78
131

1

There's nice recursive relationships - you can write $U_{m,n}$ in terms of $U_{m-1,n}$ or $U_{m,n-1}$, for example, so if you were trying specifically to construct *tables*, you can build up as you go (similarly with the signed rank test). However, tables quickly become too unwieldy to be useful. Computer calculations don't need tables, as long as they can calculate tail probabilities fairly quickly. The network algorithm approach of Mehta and Patel is sometimes used for exact nonparametric tests. ... ctd – Glen_b Dec 02 '21 at 11:37
2

ctd ... there's also the possibility of better approximations than the normal, like the Edgeworth and saddlepoint approximations. e.g. see https://www.tandfonline.com/doi/abs/10.1080/10485250310001622677 (comment rewritten to include a reference). Also see https://www.jstor.org/stable/3315887 – Glen_b Dec 03 '21 at 01:42
@Glen_b Agree that tables don't have a great deal of practical utility, but I want them for teaching purposes. – Westcroft_to_Apse Dec 03 '21 at 09:52
I fail to see any pedagogical value in producing a gigantic book of tables of the distribution of the statistic at every combination of $(m,n)$ when you apparently want to go *beyond* values like $\min(m,n)=250$. The support of the distribution grows as $mn$. If you did say every set of sample sizes (with reordering of group labels to keep $m\leq n$ to halve the total number of tables) up to $1000$, you'd have something like 100 billion cdf values to list (assuming I didn't make an error in my back of the envelope calculation) -- i.e. millions of pages. – Glen_b Dec 04 '21 at 00:57
1

With minimum sample size over 100, the error in the normal approximation at typical significance levels is typically very small. e.g. > `pwilcox(100*100/2+1.96*sqrt(100*100*201/12),100,100)` gives `[1] 0.9751233` while `pwilcox(qwilcox(.975,100,100),100,100)` gives `[1] 0.9751233` (which is indicating that both methods give effectively the same value (`5802`) for the .975 quantile but of course being discrete there isn't an exact .975 quantile. – Glen_b Dec 04 '21 at 01:13
I am not sure I see much actual benefit out this far, when the model itself is approximate (we rarely have perfect exchangeability under the null for example, so an exact calculation for an inexact model of the situation seems like a category error). Making a small error when there's already potentially a larger one is usually a nonissue. – Glen_b Dec 04 '21 at 01:15
@Glen_b Bellera, C. A., Julien, M., & Hanley, J. A. (2010). [Normal Approximations to the Distributions of the Wilcoxon Statistics: Accurate to What $N$? Graphical Insights](https://www.tandfonline.com/doi/pdf/10.1080/10691898.2010.11889486). Journal of Statistics Education, 18(2), 1–17. Gives some compelling graphics illustrating how quickly these distributions approach the normal – Alexis Dec 04 '21 at 01:57
1

1. Thanks Alexis. Looking at it, I've read this paper before. I would say that what we're normally more interested is the accuracy of the cdf and quantile function rhan the pmf, but yes, it illustrates the rapid approach to normality. 2. It looks like there's better exact algorithms than the one I mentioned before ...e.g. see [Nagarajan & Keich](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.162.1413) which uses a shifted-FFT approach. Beyond where that's practical it looks like the Edgeworth expansion does very well, and for the extreme tail the saddlepoint approximation does better – Glen_b Dec 04 '21 at 06:03
@Glen_b Glen is there a good reference for how to visually appraise and interpret CDFs? They all just look like S-curves to me, and while I understand things like the visual representation of stochastic dominance, K-S stats for eCDFs, I always feel like I am missing a critical skill in appreciating them visually, which is why I gravitate hard to PDFs. – Alexis Dec 04 '21 at 16:20
1

I understand the inclination. I can't think of a resource beyond practice and keeping in mind what they are - sums up to x. It can be useful to look at cdfs at the same time as pmfs. I stress it because a substantial collection of differences in pmf with very large typical relative error or a large chi-squared distance may correspond to a fairly small typical error in cdf, so when we compute a tail probability from the approximation our error is small. E.g. consider a series of errors all of alternating sign which are increasing in size toward the mode then decreasing in size after. – Glen_b Dec 04 '21 at 23:27
1

Sometimes a transformed cdf can be more useful (e.g. if you're interested in something like the error in the log-odds - which is close to relative error in the tail - a logistic transformation of cdf might be helpful ...similarly, a normal QQ plot (corresponding to a probit transform) can sometimes be helpful with things that are close to normal. – Glen_b Dec 05 '21 at 00:22
@Glen_b "a substantial collection of differences in pmf with very large typical relative error or a large chi-squared distance may correspond to a fairly small typical error in cdf," I saw this when I was playing with some data sets of actual, not simulated, rolls of two different 20-sided dice each produced by a different manufacturer: the PMFs each have different dramatic deviations from "uniform fairness", but since (in a D&D context) we actually care about rolling $X$ or higher, the eCDFs were what was important, and the "dramatic" deviations were mostly not (dramatic). – Alexis Dec 05 '21 at 05:50
That's interesting, I had in mind a specific example involving distributions on a d20 when I wrote the above $-$ one I came up with in the 90s, when answering a question on a usenet group about testing dice for randomness for D&D, in order to explain why for a game where success involves either "roll above" or "roll below" some value, a measure that focuses on deviations in the cdf will tend to do better at picking up important differences than say a chi-squared (which adds up squares of standardized deviations in pmf) at identifying the sort of differences that tend to impact the game. – Glen_b Dec 05 '21 at 13:30
@Glen_b I don't want to create tables all the way up to e.g. 500 or 1000 with all the numbers in between, I'd just like to add a few extra columns and rows, e.g. for 250, 500, 750, and 1000, so that students can see the difference. – Westcroft_to_Apse Dec 16 '21 at 15:40
1

Are you just after selected percentage points as well? For the purpose of seeing what the numbers look like, rounded normal approximation would be pretty close. I did find some information on Edgeworth and saddle point approximations which should be pretty accurate except in the far tail, but I don't know whether there are readily accessible implementations of them. – Glen_b Dec 16 '21 at 21:26
@Glen_b Thanks, I hadn’t thought about that - I was just hoping to give them a table of critical values with those few extra columns and rows. But again, it’s the implementation that’s the issue here! – Westcroft_to_Apse Dec 18 '21 at 08:25
Which critical values do you want to give? (i.e. which significance levels?) – Glen_b Dec 18 '21 at 14:00
@Glen_b Just the usual ones (.05, .01, .001). – Westcroft_to_Apse Dec 18 '21 at 17:55
Is that two tailed only, or both one and two-tailed? – Glen_b Dec 19 '21 at 08:41

Efficient calculation of critical values for Mann-Whitney-Wilcoxon

1 Answers1