2

I'm trying to figure out how Kolmogorov-Smirnov one-sample testing for normality is done in Minitab (or Systat, since the answers apparently match).

If this is my data vector:

abc <- c(0.0313, 0.0273, 0.0379, 0.0427, 0.0286, 0.0327, 0.0298, 0.0381, 0.0559, 0.0573,
0.0558, 0.113, 0.0464, 0.0442, 0.0579, 0.0495)

The boneheaded way of doing this in R would be:

ks.test(abc, pnorm, mean(abc), sd(abc))

Yes, I know that the ks.test help page says to not use the data to estimate the mean/sd of the comparison distribution. Hence, boneheaded. Sidenote - if I understand correctly, SAS is using this as a regular procedure? http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univariate_sect037.htm

Anyway, the p-value R gives for this improper test is 0.3027, while apparently both Minitab and Systat provide a p-value of 0.029.

The project manager won't hear anything about using other means of testing for normality (or, heavens forbid, use plots of data distribution). At this point I'm just trying to figure out what it is that the other softwares are doing, so that I can explain to myself the differences...

Am I missing something?? If people suggest using simulations instead of the direct test, like here (http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-Test-td3037232.html), would it be possible to include detailed code?

Thank you!

user2602640
  • 235
  • 1
  • 9
  • It looks more like a management issue than a statistical question ;) I spotted an 'outlier' in your sample (0.113), is it possible that Minitab would do something about it without really saying anything? – Vincent Guillemot Jul 21 '14 at 17:30
  • Ha, no kidding. Do they have forums for those too? :-) Just retested with that value gone, and nope, ain't that. It's not the only one either - this example is one of ~ 10 that were run in both programs... –  Jul 21 '14 at 17:34
  • OK, what is the value of the D statistic computed with Minitab? – Vincent Guillemot Jul 21 '14 at 17:40
  • Another question: do R and Minitab give similar outputs after a Shapiro test? – Vincent Guillemot Jul 21 '14 at 17:49
  • I'm not actually the one running Minitab (never used it), so I had to check back with the guy who did. The KS value for this vector was 0.233. If I run `lillie.test` from the `nortest` package, I start getting p-values that are very similar, but not identical. For this vector - the `lillie.test` p-val is 0.021 (and Minitab's is 0.029). It's a similar small discrepancy for all other double-run vectors as well. I know that R and Minitab have identical answers to Anderson-Darling, not sure about Shapiro... –  Jul 22 '14 at 10:51
  • 1
    The value of the KS statistic is the same with R (if `abc_s – Vincent Guillemot Jul 22 '14 at 11:24
  • Thanks! Any thoughts about the small discrepancies in p-vals? Also - if you write some of it up as an answer, I'll accept it :) –  Jul 22 '14 at 14:41
  • I found [this document](http://tijsat.tu.ac.th/issues/2011/no3/2011_V16_No3_2.pdf)... It looks like it contains what you need! – Vincent Guillemot Jul 22 '14 at 16:24

1 Answers1

3

Here is some R code to do a simulation generating data from a normal with the same mean and sd, then doing the KS test using the sample (not the generating) statistics:

out <- replicate(100000, {x <- rnorm( length(abc), mean(abc), sd(abc) );
    ks.test(x, pnorm, mean(x), sd(x))$p.value } )

hist(out)

mean(out <= ks.test(abc, pnorm, mean(abc), sd(abc))$p.value)

My estimated p-value from the simulation is 0.021 (can get more accuracy/precision by running it for more simulations) which is more similar to the minitab/systat values (but not exactly. So this suggests that the other programs may be adjusting in some way for the estimated parameter values. But there is still enough difference that I expect the adjustment is different from the simulation procedure.

Greg Snow
  • 46,563
  • 2
  • 90
  • 159