What is the difference between various Kruskal-Wallis post-hoc tests?

Question

[There seem to be a lot of similar questions here, so please point me the right direction if this has already been answered, but I think it's reasonably differentiated.]

There are many different implementations of post-hoc analyses following a Kruskal-Wallis test. I'm trying to understand how (why?) they differ, to get a sense of when one might be the right choice over another.

Working in R, consider this simulated dataset

generate.sim.data<-function(seed){
 set.seed(seed)
 sim1<-rnorm(20,4,3)
 sim2<-rnorm(20,7,3)
 sim3<-rnorm(20,1,3)
 sim4<-rnorm(20,1,3)
 simdata<-c(sim1,sim2,sim3,sim4)
 simgroup<-c(rep(c("sim1","sim2","sim3","sim4"),each=20))
 data.frame(simdata,simgroup)
}

The functions kruskal in the package agricolae; kruskalmc in the package pgirmess, posthoc.kruskal.nemenyi.test in the package PMCMR, and dunn.test in the package dunn.test all give different statistics (for any input). For certain values, they also give varying results in pairwise comparisons

  sim<-generate.sim.data(123)
  kruskal(sim$simdata,sim$simgroup,console=T)               #a,b,c,c
  kruskalmc(sim$simdata,sim$simgroup)                       #a,a,b,b
  posthoc.kruskal.nemenyi.test(sim$simdata,sim$simgroup)    #a,a,b,b
  dunn.test(sim$simdata,sim$simgroup)                       #a,b,c,c

but agree in some more clearcut cases:

  sim<-generate.sim.data(321)
  kruskal(sim$simdata,sim$simgroup,console=T)               #a,a,b,b
  kruskalmc(sim$simdata,sim$simgroup)                       #a,a,b,b
  posthoc.kruskal.nemenyi.test(sim$simdata,sim$simgroup)    #a,a,b,b
  dunn.test(sim$simdata,sim$simgroup)                       #a,a,b,b

It seems that kruskalmc and posthoc.kruskal.nemenyi.test give similar results no matter what, and kruskal and dunn.test tend to give similar results, but this latter is not always the case, e.g.

  sim<-generate.sim.data(4444)
  kruskal(sim$simdata,sim$simgroup,console=T)               #a,b,c,c
  kruskalmc(sim$simdata,sim$simgroup)                       #ac,a,b,bc
  posthoc.kruskal.nemenyi.test(sim$simdata,sim$simgroup)    #ac,a,b,bc
  dunn.test(sim$simdata,sim$simgroup)                       #ac,b,c,c

I realize I'm quibbling about some different behavior of the tests based on p-values very close to 0.05, but these tests also do give different diagnoses for real data sets (e.g., observation~method from the corn data in agricolae; occupation~eligibility from the homecare data in dunn.test). I wondered what the underlying differences are between the tests, and whether there's a reasonable criterion to choose one over another.

Alexis · Accepted Answer · 2021-03-10T05:30:35.153

Understanding how these test implementations differ requires understanding the actual test statistics themselves.

For example, dunn.test provides Dunn's (1964) z test approximation to a rank sum test employing both the same ranks used in the Kruskal-Wallis test, and the pooled variance estimate implied by the null hypothesis of the Kruskal-Wallis (akin to using the pooled variance to calculate t test statistics following an ANOVA).

By contrast, the Kruskal-Nemenyi test as implemented in posthoc.kruskal.nemenyi.test is based on either the Studentized range distribution, or the $\chi^{2}$ distribution depending on user choice.

The kruskalmc function in the pgirmess package implements Dunn's post hoc rank sum comparison using z test statistics as directed by Siegel and Castellan (1988), but these authors do not include Dunn's (1964) correction for ties, so kruskalmc will be less accurate than dunn.test when ties exist in the data.

It is difficult to discern from the documentation of kruskal whether the author is using the Conover-Iman t approximation to the distribution of rank sum differences (similar to Dunn's test, but requires that the Kruskal-Wallis be rejected, and is more powerful). A brief glance at the code does not immediately scream out Conover-Iman to me, however, it is quite possible that is an implementation of the test. More certainly implemented in R is the conover.test package.

The tl;dr: these all appear to be implementations of different test statistics or different forms of the same test statistic, so there is no reason to expect them to agree.

References

W. Jay Conover (1999) Practical Nonparametrics Statistics.

Conover, W. J. and Iman, R. L. (1979). On multiple-comparisons procedures. Technical Report LA-7677-MS, Los Alamos Scientific Laboratory.

Dunn, O. J. (1964). Multiple comparisons using rank sums. Technometrics, 6(3):241–252.

Siegel and Castellan (1988) Non parametric statistics for the behavioural sciences. MacGraw Hill Int., New York. pp 213-214

@SalMangiafico I am the author of dunn.test, and assure you that if you read the documentation, you will find that dunn.test performs two-sided tests (you are likely being bewildered by the mathematical equivalence of the rejection decision $p \leq\alpha/2$ vs. $2p \leq \alpha$). The documentation for dunn.test is explicit in its definition of rejection probabilities. Your opinion about output is, naturally, not being debated. Personally, *post-hoc* to rejecting a Kruskal-Wallis, I would recommend conover.test as the Conover-Iman test is strictly more powerful that Dunn's test. — Alexis, Oct 25 '17 at 01:03
@SalMangiafico His reports $p = P(|Z| \ge |z|)$. Mine reports $p = P(Z \ge |z|)$. For the multiple comparisons adjustments procedures reporting significance, mine reject at (my) $p \le \alpha/2$, whereas his reject at (his) $p \le \alpha$. It's really pretty straightforward. (You may gather that his $p$ is twice my $p$.) Derek and I have already been in touch; he based his dunTest code on mine. — Alexis, Oct 25 '17 at 01:32

score 2 · Answer 2 · edited Oct 25 '16 at 17:35

2

I know that this thread is older, but I came across it because I was looking for answers about the post-hoc test applied to the Kruskal-Wallis test found in the agricolae package. I really needed to know for documentation purposes so I personally emailed the maintainer to ask what procedure is used for the post-hoc test. He told me that it does use a procedure from Conover. Here is his specific reply in reference to the Kruskal-Wallis post-hoc test:

The Kruskal test is nonparametric, but it is feasible to apply a function as the least significant difference on mean ranks, which can make an adjustment on probability, the procedure is a criterion with the critical range by Conover.

I hope this helps to answer the question regarding the Kruskal-Wallis test in agricolae.

edited Oct 25 '16 at 17:35

gung - Reinstate Monica

132,789
81
357
650

answered Oct 25 '16 at 17:24

Danika Setaro

21
1

2

Conover's test is very powerful, but note that it can have inflated type I errors, so you want to mitigate against that. I have found that requiring a significant KW, then only comparing levels chosen a-priori w/ Bonferroni corrections does not yield type I error inflation & remains more powerful than other tests. – gung - Reinstate Monica Oct 25 '16 at 17:36
@gung Yes. Conover and Iman were explicit about the test being strictly applicable *pos hoc* to rejecting a KW. – Alexis Oct 25 '17 at 01:15

What is the difference between various Kruskal-Wallis post-hoc tests?

2 Answers2

Linked