14

I've got a bunch of data from two samples (control and treated), each containing several thousand values which are to undergo significance testing in R. Theoretically, the values should be continuous, but due to rounding done by measurement software they aren't and they have got ties. The distributions are unknown and the shapes of control and treated distributions might be different, so I'd like to use a non-parametric test to compare if the difference across the samples is significant for 10 different factors.

I thought of using the Kolmogorov-Smirnov test, but it's not really suitable for ties. I recently stumbled upon a new R library called Matching that executes a bootstrap version of K-S test and tolerates ties. Now is this really a good idea or should I use another test instead? And do I need to adjust the p-value?

ttnphns
  • 51,648
  • 40
  • 253
  • 462
AnjaM
  • 255
  • 2
  • 6
  • The linked paper deals with propensity score matching. It may be that the bootstrap test has more generality but I am not sure. – Michael R. Chernick Sep 03 '12 at 14:17
  • I'd have done a randomization version of something like the Kolmogorov-Smirnov (well, actually, I'd probably have used either the Anderson-Darling or the Cramer-von Mises for the K-S, but still with the randomization distribution to take care of ties). – Glen_b Sep 11 '13 at 23:25
  • Have you seen [Tom Waterhouse's code](https://stat.ethz.ch/pipermail/r-devel/2009-July/054106.html)? – Ray Koopman Sep 12 '13 at 05:11

1 Answers1

14

Instead of using the KS test you could simply use a permutation or resampling procedure as implemented in the oneway_test function of the coin package. Have a look at the accepted answer to this question.

Update: My package afex contains the function compare.2.vectors implementing a permutation and other tests for two vectors. You can get it from CRAN:

install.packages("afex")

For two vectors x and y it (currently) returns something like:

> compare.2.vectors(x,y)
$parametric
   test test.statistic test.value test.df       p
1     t              t     -1.861   18.00 0.07919
2 Welch              t     -1.861   17.78 0.07939

$nonparametric
             test test.statistic test.value test.df       p
1 stats::Wilcoxon              W     25.500      NA 0.06933
2     permutation              Z     -1.751      NA 0.08154
3  coin::Wilcoxon              Z     -1.854      NA 0.06487
4          median              Z      1.744      NA 0.17867

Any comments regarding this function are highly welcomed.

Henrik
  • 13,314
  • 9
  • 63
  • 123
  • 3
    (+1) A description of this and other tests can be found in [this blog](http://normaldeviate.wordpress.com/2012/07/14/modern-two-sample-tests/) –  Sep 03 '12 at 15:25
  • @Henrik Thanks for the suggestion and for pointing to the other question. That's really helpful! – AnjaM Sep 04 '12 at 11:41
  • @AnjaM You are welcome. You might also want to check my update. – Henrik Sep 04 '12 at 11:47