0

I am trying to test if the sampled interval between random events fits a particular geometric distribution, and am pretty lost as to what I'm doing wrong.

Assuming there's nothing wrong with the library,

# Take n samples from the geometric distribution
data = scipy.stats.geom.rvs(.2, size=n, random_state=None)

# Perform ks test against the cdf of distribution
print(scipy.stats.kstest(data, lambda x: scipy.stats.geom.cdf(  x, .2)))

results in p-values near 0 (1e-5) with sample size of 100. Increasing the sample size causes the p-values to decrease further (eg 1000 samples results in 1e-35), which is the opposite of what I expect.

Am I making some incorrect statistical assumptions? Is something wrong with my methodology? Is goodness of fit testing not what I'm looking for? Are there other statistical tests that I can do instead?

Ekiden
  • 1
  • 1
  • @Dave Do you have any suggestions on useful methods to tell if something is generated from a certain distribution? Your link suggests that using a test against a specific distribution results in the power being too high to be useful. – Ekiden Nov 11 '21 at 15:48
  • The geometric distribution may be especially problematic for such a test. I don't know how the K-S test is implemented in Scripy, but in R, a `ks.test` to match the ECDF of a sample from a discrete distribution against the CDF of from that distribution often yields warning msg that the test does not give an accurate P-value in the presence of ties. A sample from a geom dist'n with small $p$ will typically have _massive amounts_ of ties: my geom sample of size 1000 with $p=0.1$ had only about 80 uniquely different values. Gave P-val v. near 0. (Visually the ECDF plot matched the CDF just fine.) – BruceET Nov 11 '21 at 16:56
  • 3
    Unless the chance of tied values in your sample is tiny, the KS test is not applicable. The software should have warned you. If it did not, find better software. – whuber Nov 11 '21 at 17:02
  • In R: `length(unique(rgeom(1000, .1)))` returned $51.$ And `ks.test(rgeom(1000, .1), pgeom, .1)` returns P-val $4.122e-09$ along with `Warning message: ... ties should not be present for the Kolmogorov-Smirnov test` – BruceET Nov 11 '21 at 17:13

0 Answers0