6

Is there a package in R for the Kolmogorov distribution which allows me to plot density, distribution, calculates quantiles, etc.?

The Kolmogorov distribution arises from $K=\sup|B|$, where $B$ is a Brownian bridge. Its values are usually tabulated, so I thought it would have its own function in R, like the normal distribution.

It seems ks.test() uses this for cdf:

 pkolmogorov1x <- function(x, n) {
                  if (x <= 0) 
                    return(0)
                  if (x >= 1) 
                    return(1)
                  j <- seq.int(from = 0, to = floor(n * (1 - 
                    x)))
                  1 - x * sum(exp(lchoose(n, j) + (n - j) * log(1 - 
                    x - j/n) + (j - 1) * log(x + j/n)))
                }
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user3083324
  • 161
  • 7

2 Answers2

3

The function that is shown implements the CDF for one sided KS statistic

$$ D_n^{+} = \sup_{x}\{\hat{F}_n(x) - F(x)\}, $$

where $F(x)$ is theoretical (continuous) CDF and $\hat{F}_n(x)$ is empiricial CDF of the sample of size $n$. So, $D_n^{+}$ has a CDF shown in the question:

$$ F_{D_n^{+}}(x) = 1-x\sum_{j=0}^{\lfloor n(1-x)\rfloor} {n\choose j}\left(\frac{j}{n}+x\right)^{j-1}\left(1-x-\frac{j}{n}\right)^{n-j} $$

Source: Simard and L'Ecuyer (2011)

The two-sided KS statistic

$$ D_n=\sup_x|\hat{F}_n(x)-F(x)| $$

doesn't have such a simple expression. It can be computed precisely using Durbin matrix - Marsaglia, Tsang and Wang mentioned earlier provide such an implementation, but it is computationally very expensive for large $n$ and it also may produce NaNs on some inputs (Simard and L'Ecuyer, 2011). Simard and L'Ecuyer give implementation for $D_n$ CDF that chooses different methods depending in the combination of $n$ and $x$ to give precise and efficient implementation. They published C code, but not R package. I am working on implementing their method in Fortran and improving the efficiency of Durbin matrix method (from Carvalho, 2015). I will add R interface.

If you are looking for the limiting distribution of $\sqrt{n}D_n$ as $n\to\infty$ you can use the series from Wikipedia -- it converges quite quickly. Also Wikipedia article gives Vrbik's correction to make that series work for moderate values of $n$.

mobiuseng
  • 131
  • 2
  • pkolm and pkolmin from the [kolmin](https://cran.r-project.org/web/packages/kolmim/index.html) package gives me the CDF, which is usually enough for nonparametric testing. Still I think there should be a function for ploting the density, since x is a continuous variable. – user3083324 Jul 10 '21 at 14:52
  • @user3083324 I have not seen any implementation of PDF for KS. For the limiting distribution you can just differentiate the series. Otherwise, the only thing I can suggest is to use finite differentiation and plot the result. – mobiuseng Jul 10 '21 at 14:58
  • Also in R it is common to have a function for generating random numbers from a distribution. – user3083324 Jul 10 '21 at 15:04
2

The expression for the Kolmogorov-Smirnov CDF is provided in the wikipedia link:

http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Kolmogorov_distribution

Kolmogorov distribution

The Kolmogorov distribution is the distribution of the random variable $K=\sup_{t\in[0,1]}|B(t)|$ where $B(t)$ is the Brownian bridge. The cumulative distribution function of $K$ is given by $\operatorname{Pr}(K\leq x)=1-2\sum_{k=1}^\infty (-1)^{k-1} e^{-2k^2 x^2}=\frac{\sqrt{2\pi}}{x}\sum_{k=1}^\infty e^{-(2k-1)^2\pi^2/(8x^2)}.$

Note that this distribution arises as an asymptotic result, detailed in the same link.

  • Yeah well, so you are suggesting I use the analytical expression for computations? furthermore that's only the cdf. – user3083324 Sep 04 '14 at 02:25
  • Can anyone help me? – user3083324 Oct 02 '14 at 02:29
  • 1
    (1) This CDF is easily differentiated to produce the PDF. (2) The series converges very rapidly and therefore is readily calculated. (3) For small samples, where this asymptotic result might not be accurate, the Wikipedia reference to [Marsaglia, Tsang, and Wang](http://www.jstatsoft.org/v08/i18/paper) begins with an account of "a method that expresses the required probability as a certain element in the nth power of an easily formed matrix." – whuber Oct 02 '14 at 14:33
  • @whuber I see your point, but it seems R uses another expression for computation. – user3083324 Oct 04 '14 at 00:38