I'm trying to determine the best way to check if a sample of angles is uniformally distributed. As such I've built two toy functions to see what type of sampling error to expect from actually uniform distributions using scipy.stats
and numpy.random
.
import scipy.stats as stat
import numpy as np
def chiTest(n = 400, d = 10):
theta = np.random.rand(n) * np.pi - np.pi/2
#theta should be uniform on [-pi/2, pi/2]
bins = np.linspace(-np.pi/2, np.pi/2, n//d + 1)
d, _ = np.histogram(theta, bins)
return stat.chisquare(d)[-1]
Chi-squared gives me a p-value that is all over the place:
chiTest()
Out[297]: 0.41045635844052125
Out[298]: 0.78687422600627477
Out[299]: 0.016802707521273268
Out[300]: 0.66269332328844976
While Kolmogorov-Smirnov
def ksTest(n = 400):
theta = np.random.rand(n) * np.pi - np.pi/2
uni = np.linspace(-np.pi/2, np.pi/2, n)
return stat.ks_2samp(theta, uni)[-1]
Is a bit better, but still gives false negatives
ksTest()
Out[350]: 0.31152830907597184
Out[351]: 0.93696634517876642
Out[352]: 0.56898463914998687
Out[353]: 0.74776878262002455
And gets even worse when my "control" distribution is actually random:
def ksTest2(n = 400):
theta = np.random.rand(n) * np.pi - np.pi/2
uni = np.random.rand(n) * np.pi - np.pi/2
return stat.ks_2samp(theta, uni)[-1]
ksTest2()
Out[373]: 0.061328687673192099
Out[374]: 0.68866738727690713
Out[375]: 0.93696634517876642
Out[376]: 0.27118717204504289
The strange thing is that increasing n
only seems to help ksTest()
, but even big (1000000) sample sizes still can often throw p-values < 0.1
. Am I misusing, incorrectly implementing, or just plain misunderstanding these methods? Or is this the best accuracy I can hope for considering the possibility of random clustering?