3

I follow, more or less, the derivation of the KS test statistics's distribution that is given on Wikipedia. The following section on the two-sample test also makes sense if all I want to do is reject the null hypothesis of distribution equality. However, I want to calculate a p-value, not just know that I can reject at $\alpha=0.05$ but not at $\alpha=0.01$. I see how to do the algebra to find $\alpha$, but from where is that $\sqrt{-\frac{1}{2}\text{log}(\alpha)}$ derived?

Could someone please point me to the derivation of the test statistic's distribution in the two-sample case? Is it the same as in the one-sample case but taking one of the empirical CDFs as the distribution for which goodness of fit is determined? (If so, does it matter which empirical CDF we choose, or is there symmetry?) I do not have access to the Knuth book cited but can chase it down if someone knows that to give the derivation I want.

https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Kolmogorov_distribution

Dave
  • 28,473
  • 4
  • 52
  • 104
  • Please note that this is for "large" samples--it is based on the leading term of an asymptotic expression accurate to $O\left(n^{-1/2}\right).$ Knuth proposes using "a large value of $n$ like $n=1000.$" In most statistical applications there's little point to conducting the K-S test on such large samples (Knuth was testing pseudorandom number generators, which is a very special application). – whuber Jan 13 '20 at 19:57
  • @whuber Then it makes more sense why Knuth is the reference. What would be the place to find the derivation of the test statistic for two-sample KS, though? I want to put together a write up on how R and Python do it. – Dave Jan 14 '20 at 11:36
  • I think you want the derivation of the *distribution* of the test statistic. Knuth doesn't offer one: he gives only the derivation of the distribution of the one-sample statistic (the two-sample stat is the max of the upper and lower KS statistics). Knuth's principal reference is "a monograph by J. Durbin, *Regional Conf. Series on Applied Math.* **9** (SIAM, 1973)," which he characterizes as a "comprehensive review of ... KS tests." Exercise 17 in section 3.3.1 outlines the derivation of the distribution of the one-sample statistic. – whuber Jan 14 '20 at 13:24
  • Just to note, there is both the asymptotic distribution and the finite-sample distribution. The latter can be simulated; e.g., as in ks.test() in R with the argument exact=TRUE. One way to think about the finite-sample distribution is that under the null, all "orderings" of X and Y sample values are equally likely. Not the best reference but I discuss this some in Section 5 of [this Journal of Econometrics paper](https://doi.org/10.1016/j.jeconom.2018.04.003) ([open draft here](https://faculty.missouri.edu/~kaplandm/pdfs/GK2018_dist_inf.pdf)). – David M Kaplan Jan 18 '20 at 02:10

0 Answers0