KS Test and Wilcoxon Rank Sum Test, large differences in p values

Question

I am relatively new to non-parametric tests. I wrote the following R code to test 2 sample tests using the KS test and the Wilcoxon Rank Sum test. Drawing 2 samples of various sizes from unit Normal and checking p values under both tests. I see a rather large variation in p-values and no convergence as sample size increases.

The R code I used is as follows:

    results<-data.frame(Picks=numeric(), NoTie1=numeric(), NoTie2=numeric(),  intersect=numeric(), ks.stat=numeric(), ks.pval=numeric(), wx.stat=numeric(), wx.pval=numeric(), pval.diff=numeric())
    for(i in (1:100)) {
      aa<-round(rnorm(i*10),4)
      bb<-round(rnorm(i*10),4)
      cc<-intersect(aa,bb)
      aa<-setdiff(aa,cc)
      bb<-setdiff(bb,cc)
      intersect(aa,bb)
      kst<-ks.test(aa,bb,alternative="t")
      wxt<-wilcox.test(aa,bb,alternative="t")
      results[i,]<-list(i*10,length(aa), length(bb), length(cc), kst$statistic, kst$p.value, wxt$statistic, wxt$p.value,round(abs(kst$p.value-wxt$p.value),2))
}

Besides this, although I draw a sample from the unit normal, p-values don't seem to converge as sample size increases.

Welcome to Cross Validated! I've formatted the code for readability, but it still needs some commenting to explain what you're doing (not everyone speaks R) & why - in particular the rounding of the simulated observation followed by removal of ties across the two samples doesn't have any obvious motivation. It would also help to show the results & clearly explain how they differ from what you expected, as well as *why* you expected what you expected. — Scortchi - Reinstate Monica, May 31 '15 at 11:12
I rounded for readability. I did run the code without rounding and so no difference. Removed ties since for ks.test an wilcox.test expect no ties. I expected the diff in the p-values of the tests to be larger with small sample sizes and converge to the same value with larger sample size, but didn't see that. — jay, May 31 '15 at 23:56
You introduced ties by rounding - if you want to display results to a different precision use the `digits` argument to the `print` function. — Scortchi - Reinstate Monica, Jun 01 '15 at 08:43

score 3 · Answer 1 · edited Apr 13 '17 at 12:44

Simulating under the null hypothesis I'd expect the following:

For each test, a more or less uniform distribution of p-values (see Why are p-values uniformly distributed under the null hypothesis?). More or less because both the Mann–Whitney–Wilcoxon & Kolmogorov–Smirnov test statistics depend on ranks, so can only take a finite no. values for a given sample size. As the sample size increases the distributions should look more uniform.
Some positive correlation only between the p-values from each test. The test statistics are not equivalent—they partition the sample space differently—, & there's no reason to suppose their p-values will converge as sample size increases.

Your code doesn't seem well suited to investigate the relationship: make many simulated samples for each sample size & look at the marginal distribution of p-values for each test as well as the joint distribution of both.

Thanks for your insight. I'll read the link suggested and experiment further. — jay, Jun 01 '15 at 16:42

KS Test and Wilcoxon Rank Sum Test, large differences in p values

1 Answers1