My question is actually a follow-up to Glen_b's answer to the question "Simulation of KS-test with estimated parameters."
I am mostly interested on how to compute Lilliefors' test (or, more exactly, the corrected version of Kolmogorov-Smirnov test when the parameters from the target distribution has been actually been estimated from the data - be it with Lilliefors' test or something else) for distributions other than the Normal. It seems that most of the time the Lilliefors test is discussed it is used to check whether a sample comes from a normal distribution, but this is not really a limitation of this test.
As such, my question actually is twofold:
- Are there any limitations on which distributions Lilliefors' test can work with? i.e. can it be extended to work with a Gamma, Chi-square, or maybe even the empirical distribution function?
- How can we extend it to work with those distributions?
I have a rough idea on how 2 can be accomplished, but I still couldn't fully understand some parts. For example, in an answer to the aforementioned question, Glen_b gave the following description on how to apply the test through simulation:
Repeat many times:
Simulate a sample of the desired sample size from the assumed distribution.
Estimate the parameters of the distribution.
Treating the estimated parameters as the population values, transform to uniformity via the probability integral transform. (You can compute a KS statistic without transforming at this step; however, it makes the computation a bit simpler.)
Compute a KS test statistic.
Collect the simulated statistics, and work out the proportion of times the simulated statistic is at least as extreme (more consistent with $H_1$) as the observed sample value.
Some of my doubts:
In step 1, which parameters should we use for the assumed distribution when we are sampling? Is it before or after fitting with the data we have?
What exactly it means to "work out the proportion of times the simulated statistic is at least as extreme as the observed sample value"?
With this method, the end result will be a new p-value that we can compare against our chosen significance level? Or the significance level had to be somehow taken into consideration for the last part (working out the proportion)?