Can I simulate a two-sample K-S test when I can't perform a one-sample test because a distribution parameter is unknown?

Question

Let's say I have a signal which I want to test for normality. I know that the mean from the theoretical distribution is zero, but it has an undetermined (unknown) variance. If I knew mean and variance a priori, I would run a one-sample Kolmogorov-Smirnov test. Since I don't know what the variance should be, I am thinking of generating some random data from a normal distribution with zero mean and same variance as the sample (signal) and perform a two-sample K-S test, one sample being the true (original) sample, and the other being the randomly generated one. Is this procedure correct/valid?

Some contextualization:

♫ I know one should not perform a one-sample K-S test with theoretical distribution parameters predicted from the data, that's why I'm thinking of doing this "simulation" for the two-sample test and I'm not confident of its validity;

♫ The signal is an acoustical room impulse response (its amplitude is proportional to the sound pressure). The normality is expected for the later part of the signal. It is also expected for the real and imaginary parts of its Fourier Transform. The signal may also be a simulated impulse response, created through the decaying exponential weighting of a gaussian noise;

♫ I'm not interested in other tests (like the Lilliefors test) because research with the procedure I am explaining here has already been published (applied to all topics cited above). So I'd like to know if the procedure is valid or not (and its effects) so I can better understand and evaluate the results. I'm not one of the authors, so I don't have access to the exact utilized procedures or algorithms;

Welcome to Cross Validated! Make the randomly generated sample large enough & you're performing the Lilliefors test; keep it small & the difference is the introduction of random noise: there's no point to such a procedure. (An agreeable feature of the Lilliefors test is that the null distribution of the K-S statistic is the same for a given sample size whatever the mean & variance of the hypothesized Gaussian - see https://stats.stackexchange.com/q/110272/17230.) — Scortchi - Reinstate Monica, Feb 13 '21 at 08:48
@Scortchi-ReinstateMonica I understand the critical values of the Lilliefors test come from simulating a lot of data and constructing a statistic distribution. When I say "simulate a two-sample K-S test" I am considering the calculation of a single statistic between a true sample and a generated one. So to calculate the statistic once in this manner and compare it with an already tabulated Lilliefors critical value for same sample size would be correct? (using at least 120 samples for "true" and generated data). I understood that to compare the statistic with a K-S c.v. would then be wrong. — Phxuibs, Feb 13 '21 at 15:58
Maybe [Wikipedia](https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test#Test_with_estimated_parameters) on K-S test with estimated parameters, will help. Or [this](https://stats.stackexchange.com/questions/111693/simulation-of-ks-test-with-estimated-parameters). — BruceET, Feb 13 '21 at 18:48
@Phxuibs: That'd be less wrong than using the usual distribution for the two-sample statistic (which assumes independence of the two-samples); but you'd still be adding random noise for no good reason. (Even with a good reason, randomized tests are inferentially preposterous.) — Scortchi - Reinstate Monica, Feb 14 '21 at 01:05
Sorry - in fact it wouldn't be *less* wrong, but differently wrong; the actual Type I error would be higher than the nominal error rather than lower. What would be the reason for generating only 120 artificial observations in any case? — Scortchi - Reinstate Monica, Feb 15 '21 at 18:32
@Scortchi-ReinstateMonica The authors were analyzing the signal with 120 (or 200) samples. They generated the other samples in the same amount. I understand the sample groups should be independent for the K-S test. I was confused! Now I get that, in the Lilliefors test, we compare the sample with the hypothesized CDF (same as one-sample K-S), but the distribution is estimated from data and the c.v. is different (came from simulations). So when you say to generate large sample to compare with, you suggest to substitute the theoretical CDF with this generated sample? — Phxuibs, Feb 16 '21 at 13:42
I wasn't exactly *suggesting* that, but it would be a way of performing the right test. Anyway I thought it better to give a proper answer at this point. I'll add some examples of power calculations when I find time. — Scortchi - Reinstate Monica, Feb 16 '21 at 15:54
@Scortchi-ReinstateMonica Sorry, I know you were not suggesting it, just couldn't find a better word. And thanks for your answer! It's all much clearer now! I think your answer together with your comments are more than enough to support the point that the suggested procedure is indeed not appropriate. No need to overwork it! Nevertheless, any complement will be much appreciated, as all your effort so far. — Phxuibs, Feb 16 '21 at 17:44

Scortchi - Reinstate Monica · Accepted Answer · 2021-02-17T09:37:44.960

Your test will be rather conservative if you suppose the K–S statistic has the same distribution as in the usual two-sample K–S test; it'll be very liberal if you suppose the statistic has the same distribution as in the one-sample Lilliefors test. Here are the results of a simulation^† under the null using the sample size of 120 you mentioned in comments:

If you increase the size of the simulated sample enough (there's no reason it has to be equal to that of the observed sample), the distribution of the test statistic will approximate the one in the Lilliefors test. That's merely a roundabout way of calculating the Gaussian distribution function, which is already done very well by any statistical software. In any case, you'll still need to simulate the distribution under the null to calculate p-values (unless you want to look for tables or large-sample formulae for critical values, for the case when the mean is known & the variance unknown).

You could of course obtain p-values/critical values from the null distribution simulated above. But common sense says that evidence about the shape of the distribution from which a sample is drawn is to be found only in the observed sample; & not in extraneous noise introduced by coin tosses, or dice rolls, or pseudo-random number generators. Statistical theory goes a little further, saying it's to be found only in the the sample configuration—the (unordered) set of standardized residuals, or an equivalent. The consequences of ignoring these considerations are (1) an loss of power relative to to Lilliefors test, which is quite unnecessary, & (2) that different researchers, given the same data, & making the same assumptions, may likely come to very different conclusions about lack of fit because they draw different simulated samples, which is quite absurd.

† In R:

set.seed(1815)
n.obs <- 120
n.sim <- 120
B <- 50e3

stat.1s <- numeric(B)
stat.2s <- numeric(B)
stat.prop <- numeric(B)

for (i in 1:B) {
  x <- rnorm(n.obs, mean = 0, sd = 1)
  s <- sqrt(sum(x^2)/n.obs)
  stat.1s[i] <- ks.test(
    x, "pnorm", mean = 0, sd = s, exact = TRUE
  )$statistic
  y <- rnorm(n.sim, mean = 0, sd = 1)
  stat.2s[i] <- ks.test(
    x, y, exact = TRUE
  )$statistic
  y <- rnorm(n.sim, mean = 0, sd = s)
  stat.prop[i] <- ks.test(
    x, y, exact = TRUE
  )$statistic
}

df.1s <- ecdf(stat.1s)
df.2s <- ecdf(stat.2s)
df.prop <- ecdf(stat.prop)

plot(df.2s, col="forestgreen", verticals=TRUE, do.points= FALSE,
     xlim=c(0,0.2),
     xlab = "test statistic", ylab = "distribution function under null", main = NA)
plot(df.1s, col="dodgerblue", verticals=TRUE, do.points= FALSE, add = TRUE)
plot(df.prop, col="darkorange", verticals=TRUE, do.points= FALSE, add = TRUE)
legend(
  x = 0.11, y = 0.2, bty="n", #title = "max. distance of EDF from ...",
  lty=1, col=c("dodgerblue", col="forestgreen", "darkorange"),
  legend=c("Lilliefors test", "2-sample K-S test", "proposed test")
)

Can I simulate a two-sample K-S test when I can't perform a one-sample test because a distribution parameter is unknown?

1 Answers1