Help me understand a step of the procedure for KS test with estimated parameters

Question

I read the following procedure to perform KS test with estimated parameters

Testing whether data follows T-Distribution

At first I couldn't make head or tails of it, partly because I mistaked the language for R, while I guess it's actually MATLAB (thanks to @Glen_b for noticing my error). Now I understand more of it, but there's still a step that baffles me. According to the definition of p-value (the probability of getting a statistic as extreme or more as the sample one, under the null hypothesis), I thought this would work:

choose a family of distributions $\{f(x|\theta_1,\ldots,\theta_n)\}$ indexed by parameters $\theta_1,\ldots,\theta_n$
from my original sample $S$ of size $N$ ($N$ and $n$ are not related, apart from the obvious condition $N>n$), estimate parameters $\hat{\theta_1},\ldots,\hat{\theta_n}$ using method of moments, or maximum likelihood,etc.. The corresponding distribution from the family, $\hat{f}(x)=f(x|\hat{\theta_1},\ldots,\hat{\theta_n})$, is my null distribution.
generate $M$ random samples from $\hat{f}$. For each sample compute KS distance $K_i$ of random sample $i$ from $\hat{f}$
sort $K_i$ and tabulate empirical CDF at various $x$ as $G(x)=\frac{M_i}{M}$ where $M_i$ is the number of samples $K_i<x$
from tabulated $G(x)$, interpolate or use splines to compute probability of a KS distance as large as or larger than that between the original sample $S$ and $\hat{f}$

However, it seems to me that the MATLAB code in the above link does something different, in the bootstrap bit:

% get KS-test critical values by parametric bootstrapping from estimated
m=999;
r=random(null_pd,n,m);

stats = zeros(m,1); % store test statistics
est_pd = makedist('tlocationscale');
opts = statset(statset('tlsfit'),'MaxIter',1000);
opts = statset(opts,'MaxFun',2000);

for i=1:m
    bsample = r(:,i);
    [~,~,stats(i)] = kstest(bsample,'CDF',est_pd.fit(bsample,'options',opts));
end

I am not that familiar with MATLAB for statistical analysis, but my understanding is that for each sample from the null distribution, the code is re-estimating the parameters of the distribution...why is that? Given the definition of p-value, what's wrong with my approach?

For each simulated sample you want to calculate the test statistic in the same way as you did for the actual sample, i.e. calculate the K-S distance between the empirical distribution & the best-fitting distribution *for that sample*. — Scortchi - Reinstate Monica, Aug 14 '15 at 10:24

Help me understand a step of the procedure for KS test with estimated parameters

0 Answers0