8

I'm trying to understand the Kolmogorov-Smirnov test using a very simple example. I generate a set of random, uniform values between 0 and 1.0. I then test that these values are from a uniform distribution by using the scipy kstest function. I'm expecing a very small D value and a pvalue close to 1.0, but instead I get wildly varying pvalues every time I run the code. What am I missing?

import numpy as np
import scipy
a = np.random.uniform(size=4999)
print(scipy.stats.kstest(a, 'uniform'))

Here are the outputs of a few consecutive runs:

(0.0075523161200627964, 0.93798952050647577)
(0.013787195268362473, 0.29799260741344774)
(0.014359046616557847, 0.25402403230845855)
(0.012521820948675988, 0.41329007558099806)
(0.011159003477582918, 0.56216895575676396)
user17426
  • 183
  • 1
  • 1
  • 4
  • 1
    You say widely varying p-values but are they all non-significant? – Dan Aug 27 '14 at 16:59
  • 2
    You are getting what you should. Look at this [CrossValidated thread](http://stats.stackexchange.com/questions/10613/why-are-p-values-uniformly-distributed-under-the-null-hypothesis) – Aniko Aug 27 '14 at 18:27

1 Answers1

7

For the KS test the p-value is itself distributed uniformly in [0,1] if the H0 is true (which it is if you test whether it your sample is from $U(0,1)$ and the random number generation works okay). It therefore must "vary wildly" between 0 and 1, in fact its standard deviation is $1/\sqrt{12}$ which is roughly 0.3.

You can check this by looking whether the percentages of p values smaller or equal to some $p_0$ over your independent consecutive runs is close to said $p_0$.

See also Why are p-values uniformly distributed under the null hypothesis?

Momo
  • 8,839
  • 3
  • 46
  • 59
  • Is there any advantage of choosing small $p_0$, like 1e-3 vs 0.5? – Ahmed Fasih Apr 24 '17 at 03:07
  • 1
    @ahmed No, in what I wrote, there is no significance to the value of $p_0$ at all. The percentages of p values smaller or equal to some $p_0$ should be close to $p_0$ regardless of the concrete $p_0$. That said, for very small $p_0$ one can encounter numeric problems when calculating the p-values. – Momo May 11 '17 at 15:33