Kruskal-Wallis test: how to calculate the exact p-value?

Question

I would like to calculate the exact p-value of a Kruskal-Wallis test.

How can I do it? Is it a huge computational task?

Imagine I have something like that:

Group A: 2, 4, 6, 7, 10, 20, 25, 26, 28, 29, 30, 30
Group B: 2, 4, 6, 15, 17, 20, 20, 20, 30
Group C: 1, 2, 3, 3, 3,
Group D: 4, 4, 6, 7, 8
Group E: 2, 20, 30, 40, 41, 42

I read those, but I did not get them completely: Wilcoxon Rank-Sum: computing the exact $p$-value, Kruskal-Wallis test data considerations, so a "for dummies" explanation would be hugely appreciated.

(I use R)

Benoit Sanchez · Accepted Answer · 2017-12-22T15:52:40.630

The general idea to compute p-values on finite possibilities is to run over all possibilities and count how many time the statistics is greater than your observed statistics.

Example for a binomial: You want to test if a coin is biased with 10 heads/tails. Compute the statistics $h_0=|heads-5|$ for your dataset. Under null (unbiased), for all sequences of 0s and 1s (who all have the same probability), compute the statistics and see how often it is greater than $h_0$. This is the p-value. There are $2^{10}=1024$ sequences to generate.

For Kruskal-Wallis: Call $\omega_0$ your dataset and $h_0$ the value of the KS statistics for your dataset.

Call $\omega$ a permutation of the 37 possibles values. For each $\omega$, attribute the 12 first to A, the 9 following to B... You get one virtual dataset also called $\omega$. For each $\omega$ , compute the value of of KW statistics, say $h(\omega)$. Then count how many times this value of $h(\omega)$ is greater or equal to $h_0$. Also count the total number of permutations. Divide, you get the p-value.

How many $\omega$? Well the number of permutations: $37!\approx 10^{43}$ (Stirling's formula). A usual computer does around $10^9$ atomic operations by second and each permutation would take at at least 1000 such operations so you can consider that you can calculate $10^6$ permutations per second. This is not feasible.

Now instead you can directly work on combinations instead of permutations. You can calculate the number as $\binom{37}{12}\binom{37-12}{9}...=\frac{37!}{12!9!...}\approx10^{22}$. Still not feasible.

This only describes the "naive" algorithms. But I think advanced algorithmic ideas (for exact values) become quite tough to understand as soon as the aim is to make them feasible even for values as small as 37. And they still need intensive computing resources. Further reading: http://faculty.virginia.edu/kruskal-wallis/

However it's not necessary to have real "exact" values. The usual method is to use a simple Monte Carlo method: instead of testing all permutations/combinations, we test $N$ of them picked up randomly. This is very simple to implement. It converges to the real p-value in $1\sqrt N$ in a way that can be controlled easily. You get the precision that you want in a reasonable time. Compared to $\chi^2$ approximation that can be poor for small sample size (37, 12, 9...) it is almost exact, since you don't assume $37\approx+\infty$ but use an $N$ as large as you want. It's what most stat softwares do.

"This is not feasible" and "Still not feasible" make it clear for me the practical value of your answer, which adds to its theoretical value. Thank you — statisticianwannabe, Dec 21 '17 at 15:33
What you say is true but the OP does not need to generate all the permutations to get an approximate answer. — mdewey, Dec 21 '17 at 15:35
Yes I added a note about Monte Carlo methods that are close to "exact" no matter what the sample size is. — Benoit Sanchez, Dec 22 '17 at 09:52

Kruskal-Wallis test: how to calculate the exact p-value?

1 Answers1