9

I have a simulation where an animal is placed in a hostile environment and timed to see how long it can survive using some approach to survival. There are three approaches it can use to survive. I ran 300 simulations of the animal using each survival approach. All simulations take place in the same environment but there's some randomness so it's different each time. I time how many seconds the animal survives in each simulation. Living longer is better. My data looks like this:

Approach 1, Approach 2, Approach 2
45,79,38
48,32,24
85,108,44
... 300 rows of these

I'm unsure of everything I do after this point so let me know if I'm doing something stupid and wrong. I'm trying to find out if there's a statistical difference on lifespan using a particular approach.

I ran a Shapiro test on each of the samples and they came back with tiny p values, so I believe the data isn't normalized.

Data on rows have no relationship to each other. The random seed used for each simulation was different. As a result, I believe the data isn't paired.

Because the data is not normalized, not paired and there were more than two samples, I ran a Kruskal Wallis test which came back with a p-value of 0.048. I then moved on to a post hoc, selecting Mann Whitney. In really not sure if Mann Whitney should be used here.

I compared each survival approach with each other approach by performing the Mann Whitney test i.e. {(approach 1, approach 2), (approach 1, approach 3), (approach 2, approach 3)}. There was no finding of statistical significance between the pair (approach 2, approach 3) using a two tailed test but there was significance difference found using a one tailed test.

Problems:

  1. I don't know if using Mann Whitney like this makes sense.
  2. I don't know if I should be using a one or two tailed Mann Whitney.
Phlox Midas
  • 205
  • 1
  • 3
  • 6
  • Do you have any a priori hypothesis about the relative strength of different approaches (e.g. approach1>approach2>approach3)? This is crucial to answer your questions. – amoeba Aug 14 '14 at 16:48
  • I have the mean, median and standard deviation and it appears that approach 3 is better because it has a higher median and mean but it also has a much higher standard deviation so I'm not sure. But I had no way of knowing this before hand. – Phlox Midas Aug 14 '14 at 17:07
  • Or is it also known as the Bonferroni correction? – Phlox Midas Aug 15 '14 at 13:23
  • Phlox: if there was "no way of knowing this before hand", you should absolutely **not** use a one-tailed test, only two-tailed (as @Alexis mentioned in his reply as well). – amoeba Aug 15 '14 at 15:32
  • 6
    @amoeba "her" ;) – Alexis Sep 23 '14 at 17:38

4 Answers4

16

No, you should not use the Mann-Whitney $U$ test in this circumstance.

Here's why: Dunn's test is an appropriate post hoc test* following rejection of a Kruskal-Wallis test. If one proceeds by moving from a rejection of Kruskal-Wallis to performing ordinary pair-wise rank sum (i.e. Wilcoxon or Mann-Whitney) tests, then two problems obtain: (1) the ranks used for the pair-wise rank sum tests are not the ranks used by the Kruskal-Wallis test; and (2) the rank sum tests do not use the pooled variance implied by the Kruskal-Wallis null hypothesis. Dunn's test does not have these problems

Post hoc tests following rejection of a Kruskal-Wallis test which have been adjusted for multiple comparisons may fail to reject all pairwise tests for a given family-wise error rate or given false discovery rate corresponding to a given $\alpha$ for the omnibus test, just as with any other multiple comparison omnibus/post hoc testing scenario.

Unless you have reason to believe that one group's survival time is longer or shorter than another's a priori, you should be using the two-sided tests.

Dunn's test can be performed in Stata using dunntest (type net describe dunntest, from(https://www.alexisdinno.com/stata)), and in R using the dunn.test package.

Also, I wonder if you might take a survival analysis approach to assessing whether and when an animal dies based on different conditions?


* A few less well-known post hoc pair-wise tests to follow a rejected Kruskal-Wallis, include Conover-Iman (like Dunn, but based on the t distribution, rather than the z distribution, implemented for Stata in the conovertest package, and for R in the conover.test package), and the Dwass-Steel-Citchlow-Fligner tests.

Alexis
  • 26,219
  • 5
  • 78
  • 131
  • Thanks for your answer. Is the Dunn test also known as the Nemenyi-Damico-Wolfe-Dunn test or is that a separate test? – Phlox Midas Aug 15 '14 at 12:26
  • I ask because I can't find any implementation of the Dunn test. – Phlox Midas Aug 15 '14 at 13:02
  • @PhloxMidas I don't know about the "Nemenyi-Damico-Wolfe-Dunn test," but Wikipedia implies it is an appropriate *post hoc* test following rejection of an omnibus test in a repeated measures design—e.g. following a Friedman test. Also, see my comment about Stata. – Alexis Aug 15 '14 at 18:04
7

A unifying generalization of Kruskal-Wallis/Wilcoxon is the proportional odds model, which admits general contrasts with either pointwise or simultaneous confidence intervals for odds ratios. This is implemented in my R rms package's orm and contrast.rms functions.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
1

You can also use the critical difference after Conover or the critical difference after Schaich and Hamerle. The former is more liberal whereas the latter is exact but lacks a bit of power. Both methods are illustrated on my website brightstat.com and brightstat's webapp also lets you calculate these critical differences and perform the post-hoc tests right away. Kruskal-Wallis on brightstat.com

-1

If you are using SPSS, do the post-hoc Mann-Whitney with Bonferroni correction (p value divided by the number of groups).

  • The Mann-Whitney suffers from the two problems I identify in my answer, and is an inappropriate *post hoc* test for Kruskal-Wallis. – Alexis Sep 28 '15 at 17:58