2

This question is somehow connected to this one. I am performing a two-sample KS test in R and I think I have not fully understood the issue of the ties.

Reading the help: The presence of ties always generates a warning, since continuous distributions do not generate them. If the ties arose from rounding the tests may be approximately valid, but even modest amounts of rounding can have a significant effect on the calculated statistic. So, I understand this in the case of a single sample, but why do I get the same warning if the tie is represented by the same value present in the two vectors?

Example:

no ties case

set.seed(123)
x <- rnorm(50)
y <- runif(30)
ks.test(x, y)

Two-sample Kolmogorov-Smirnov test
data:  x and y
D = 0.52, p-value = 3.885e-05
alternative hypothesis: two-sided

case with ties

x <- c(0,1,1, rnorm(47)) # this vector has the value 1 repeated twice
y <- c(1,runif(29))
ks.test(x, y)

Two-sample Kolmogorov-Smirnov test
data:  x and y
D = 0.5, p-value = 0.0001696
alternative hypothesis: two-sided
Warning message:
In ks.test(x, y) : cannot compute exact p-value with ties

case I thought it shouldn't be tied, but in fact it is:

x <- c(0,1,1, rnorm(47))
y <- c(1,runif(29))
ks.test(unique(x), unique(y))

Two-sample Kolmogorov-Smirnov test
data:  unique(x) and unique(y)
D = 0.59184, p-value = 4.363e-06
alternative hypothesis: two-sided
Warning message:
In ks.test(unique(x), unique(y)) : cannot compute exact p-value with ties
ttnphns
  • 51,648
  • 40
  • 253
  • 462
Nemesi
  • 235
  • 3
  • 13

1 Answers1

5

The reason in the one-sample case is exactly the same reason in the two-sample case: in general, $Pr(X = c) = 0$ for some continuously distributed $X$ and some single value $c$. Ties (single sample or two sample) imply that $Pr(X = c) \ne 0$.

Alexis
  • 26,219
  • 5
  • 78
  • 131
  • Still, this is not fully clear to me. Sorry. If the two-sample KS test is aimed at quantifying the distance between two empirical cumulative distribution functions, how does this imply that the two underlying samples cannot have an observation in common? Could you please elaborate a bit on this (better with an example)? – Nemesi Jan 28 '19 at 14:02
  • @Nemesi A very basic idea in statistics is that ***in a continuous distribution*** $X$ the probability of observing ***any*** single number $c$ is ***zero*** (that's the $Pr(X=c) = 0$ part). The two-sample Kolmogorov-Smirnov test assumes that the distributions of both variables are continuous, so $Pr(X = Y) = 0$. So there is zero probability of *tied* observations—one observation exactly equaling another—between variables (within variables also). – Alexis Jan 28 '19 at 16:35