[There seem to be a lot of similar questions here, so please point me the right direction if this has already been answered, but I think it's reasonably differentiated.]
There are many different implementations of post-hoc analyses following a Kruskal-Wallis test. I'm trying to understand how (why?) they differ, to get a sense of when one might be the right choice over another.
Working in R, consider this simulated dataset
generate.sim.data<-function(seed){
set.seed(seed)
sim1<-rnorm(20,4,3)
sim2<-rnorm(20,7,3)
sim3<-rnorm(20,1,3)
sim4<-rnorm(20,1,3)
simdata<-c(sim1,sim2,sim3,sim4)
simgroup<-c(rep(c("sim1","sim2","sim3","sim4"),each=20))
data.frame(simdata,simgroup)
}
The functions kruskal
in the package agricolae
; kruskalmc
in the package pgirmess
, posthoc.kruskal.nemenyi.test
in the package PMCMR
, and dunn.test
in the package dunn.test
all give different statistics (for any input). For certain values, they also give varying results in pairwise comparisons
sim<-generate.sim.data(123)
kruskal(sim$simdata,sim$simgroup,console=T) #a,b,c,c
kruskalmc(sim$simdata,sim$simgroup) #a,a,b,b
posthoc.kruskal.nemenyi.test(sim$simdata,sim$simgroup) #a,a,b,b
dunn.test(sim$simdata,sim$simgroup) #a,b,c,c
but agree in some more clearcut cases:
sim<-generate.sim.data(321)
kruskal(sim$simdata,sim$simgroup,console=T) #a,a,b,b
kruskalmc(sim$simdata,sim$simgroup) #a,a,b,b
posthoc.kruskal.nemenyi.test(sim$simdata,sim$simgroup) #a,a,b,b
dunn.test(sim$simdata,sim$simgroup) #a,a,b,b
It seems that kruskalmc
and posthoc.kruskal.nemenyi.test
give similar results no matter what, and kruskal
and dunn.test
tend to give similar results, but this latter is not always the case, e.g.
sim<-generate.sim.data(4444)
kruskal(sim$simdata,sim$simgroup,console=T) #a,b,c,c
kruskalmc(sim$simdata,sim$simgroup) #ac,a,b,bc
posthoc.kruskal.nemenyi.test(sim$simdata,sim$simgroup) #ac,a,b,bc
dunn.test(sim$simdata,sim$simgroup) #ac,b,c,c
I realize I'm quibbling about some different behavior of the tests based on p-values very close to 0.05, but these tests also do give different diagnoses for real data sets (e.g., observation~method
from the corn
data in agricolae
; occupation~eligibility
from the homecare
data in dunn.test
). I wondered what the underlying differences are between the tests, and whether there's a reasonable criterion to choose one over another.