Is Kolmogorov-Smirnov test valid with discrete distributions?

Question

I'm comparing a sample and checking whether it distributes as some, discrete, distribution. However, I'm not enterily sure that Kolmogorov-Smirnov applies. Wikipedia seems to imply it does not. If it does not, how can I test the sample's distribution?

+1 A beautiful example of mistakenly applying the K-S Test to data with (many) ties is given on the help page for an Excel statistics add-on at http://www.real-statistics.com/non-parametric-tests/goodness-of-fit-tests/two-sample-kolmogorov-smirnov-test/. The result is wrong for many reasons. *Caveat lector!* — whuber, Aug 28 '18 at 14:23
KS-tests for discrete null distributions are available: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Discrete_null_distribution — Astrid, Dec 31 '18 at 00:18
A more thorough answer can be found in a closely related question: https://stats.stackexchange.com/questions/88764/test-for-difference-between-2-empirical-discrete-distributions — deps_stats, Sep 08 '20 at 18:43

score 17 · Accepted Answer · answered Jul 30 '10 at 17:10

17

It does not apply to discrete distributions. See http://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm for example.

Is there any reason you can't use a chi-square goodness of fit test? see http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm for more info.

answered Jul 30 '10 at 17:10

PeterR

1,712
1
16
13

Sorry for the intrusion, but i don't really understand why it is applicable only to continuous distribution (K-S and other validation tests). Can someone explain to me this fact? – Maurizio Sep 15 '11 at 12:17
8

@Maurizio -- the K-S test statistic has the same distribution under all _continuous_ distributions, but if the actual distribution is not continuous, and one tries to construct a level $\alpha$ test assuming that the distribution is continuous, then the actual level of the test with be less than $\alpha$. (c.f. Lehmann & Romano _Testing Statistical Hypotheses, Third Edition_, p. 584). You can still make a level $\alpha$ test based on the K-S statistic, but you'll have to find some other method to get the critical value, e.g. by simulation. – DavidR Oct 11 '11 at 04:58
1

There is a discrete KS-test: http://www.stat.yale.edu/~jay/EmersonMaterials/DiscreteGOF.pdf – Astrid Dec 29 '18 at 14:07

Glen_b · Answer 2 · 2015-11-19T02:57:29.507

As is often the case in statistics, it depends on what you mean.

If you mean "I calculate my test statistic on a sample drawn from a discrete distribution and then look up the standard tables" then you'll get a true type I error rate lower than the one you chose (possibly a lot lower).

How much depends on "how discrete" the distribution is. If the probability of any one outcome is fairly low (so the proportion of tied-values in the data would be expected to be low) then it won't matter very much -- many people wouldn't have a problem with running a 5% test at 4.5% say. So for example, if you're testing a discrete uniform on [1,1000], you probably needn't worry.

But if there's a high probability of a value being tied, then the effect on the type I error rate can be marked. If you get a significance level of 0.005 when you wanted 0.05, that may be an issue, since it will correspondingly impact the power.
If instead you mean "I calculate my test statistic on a sample drawn from a discrete distribution and then use a suitable critical value/calculate a suitable p-value for my situation" (say via a permutation test, for example), then the test is certainly valid in the sense that you'll get the right type I error rate -- up to the discreteness of the test statistic itself, of course. (Though there may well be better tests for your particular purpose, just as there usually are in the continuous case.)

Note that the distribution of the test-statistic itself is no longer distribution-free but a permutation-test avoids that issue.

So sometimes it's okay to use the standard tables even with discrete distributions, and even when its not okay, it's not so much the test statistic as the critical values/p-values you use with it that's the issue.

As usual Glen, your answer is high-quality. But perhaps the best part about it is that you've actually echoed the joke I made in this post about statisticians saying "it depends"! http://stats.stackexchange.com/questions/182442/probability-of-getting-the-exact-same-letters-in-scrabble-2-turns-in-a-row/182453#182453 — Sycorax, Nov 19 '15 at 02:57
@user777 that wasn't accidental; it amused me, and I was thinking as I read this question "well, it depends" ... so I made sure to say it explicitly to echo your post. — Glen_b, Nov 19 '15 at 03:00

score 4 · Answer 3 · edited Nov 19 '15 at 01:30

4

I believe the K-S test uses the fact that if $X$ is a random variable with CDF $F$ then $F(X)$ is a uniform random variable. This is not the case if $X$ is not continuous. For example, if $X$ is Bernoulli then $F(X)=X$, not a uniform.

edited Nov 19 '15 at 01:30

Silverfish

20,678
23
92
180

answered Nov 19 '15 at 01:17

F RA

41
1

Is Kolmogorov-Smirnov test valid with discrete distributions?

3 Answers3

Linked