Interpreting p-values of goodness-of-fit tests using resampling

Question

I would like to find a suitable distribution to fit to a dataset. Beyond visual analysis of histograms with overlaid density curves and Q-Q plots I would like to perform statistical tests, namely Kolmogorov-Smirnov and Anderson-Darling. As I don't know the fully specified distribution a priori, I estimate the distribution parameters from the data. However this invalidates the tests and so I simulate the test statistics a large number of times.

My issue is in interpreting the output. Here is an example using R code from the answer on How to determine which distribution fits my data best? to test the suitability of a Weibull distribution for my data:

library(logspline)
library(FAdist)
library(fitdistrplus)
library(ADGofTest)

n.sims <- 5e4 #number of simulations for KS and AD tests
x <- as.numeric(zooList$flow12001) #data vector length 973

fit.wei <- fitdist(x, "weibull")
#replicate KS
ksstats <- replicate(n.sims, {
  r <- rweibull(n = length(x), shape = fit.wei$estimate["shape"]
              , scale = fit.wei$estimate["scale"])
  as.numeric(ks.test(r, "pweibull", shape = fit.wei$estimate["shape"]
                     , scale = fit.wei$estimate["scale"])$statistic)      
})

ksfit <- logspline(ksstats)
kspval <- 1 - plogspline(ks.test(x, "pweibull", shape= fit.wei$estimate["shape"],
                       scale = fit.wei$estimate["scale"])$statistic, ksfit)
> kspval
[1] 0.2647569

#replicate A-D
adstats <- replicate(n.sims, {
  r <- rweibull(n = length(x), shape = fit.wei$estimate["shape"]
              , scale = fit.wei$estimate["scale"])
  as.numeric(ad.test(r, pweibull, shape= fit.wei$estimate["shape"]
                     , scale = fit.wei$estimate["scale"])$statistic)      
})

adfit <- logspline(adstats)
adpval <- 1 - plogspline(ad.test(x, pweibull, shape = fit.wei$estimate["shape"],
                       scale = fit.wei$estimate["scale"])$statistic, adfit)
> adpval
[1] 0.1292376

My interpretation of these results is that for repeated tests, the KS test results in a rejection of H0 ~26% of the time whilst the AD test a rejection of H0 ~13% of the time. Often hypothesis tests are compared to tables of critical values for the test statistic or associated p-values (generally p-val = 0.05), but in my example can I choose the p-value arbitrarily due to the resampling procedure?

I am aware that there exist tables for the critical values of the KS test (e.g. http://www.cas.usf.edu/~cconnor/colima/Kolmogorov_Smirnov.htm) and asymptotic AD test (http://www.cithep.caltech.edu/~fcp/statistics/hypothesisTest/PoissonConsistency/AndersonDarling1954.pdf) but to be honest I am not sure how to use these in a resampling-based methodology.

score 1 · Accepted Answer · edited Apr 13 '17 at 12:44

1

You are essentially looking at a distribution of p-values. As discussed in [1] and [2], p-value is a uniformly-distributed random variable when the null hypothesis is true. Note that in this case your null hypothesis is that your sampled data orginate from the reference distribution curve.

A good way to validate this fact is to perform a large number of K-S tests where you compare between resampled populations of your data instead of a reference distribution curve. You will see that indeed, when comparing sets of data that come from the same model, the p-value distribution is uniform.

Thus, you could perhaps use a uniformity test on the p-value distribution to determine its uniformness. If your uniformity test gives a p-value < 0.05 or some other critical value, then you can reject your null hypothesis.

[1] Murdoch, D, Tsai, Y, and Adcock, J (2008). P-Values are Random Variables. The American Statistician, 62, 242-245.

[2] Why are p-values uniformly distributed under the null hypothesis?

edited Apr 13 '17 at 12:44

Community

1

answered Oct 19 '15 at 11:35

ikonikon

177
9

Thank you for your answer, I was not aware the p-value distribution under H0 being true is uniform. Is the KS p-value of 0.26 given above irrelevant or uninterpretable then? – boro141 Oct 19 '15 at 11:53
1

What you can say based on a single p-value of 0.26 is that the null hypothesis cannot be rejected. However, re-sampling provides an opportunity for a more robust evaluation of the hypothesis. Note that goodness-of-fit tests can be used to reject the null hypothesis with some statistical degree of confidence. However, a p-value above the critical threshold does not mean that the null hypothesis can be automatically accepted. – ikonikon Oct 19 '15 at 13:12
Ok thank you. Upon analysis, the distribution of the p-values does not appear to be normal according to a histogram and a further KS test (on the p-values' suitability for a uniform distribution) returning a p-value << 0.05. Thus the null hypothesis that the p-values come from a Uniform distribution can be rejected, thus the previous null hypothesis that the original data come from a Weibull distribution can be rejected. Is this last sentence correct? – boro141 Oct 19 '15 at 13:33
I am not sure what is meant by 'the distribution of the p-values does not appear to be normal'..do you mean 'does not appear to be uniform'? If the distribution of p-values does not look like a uniform distribution (which is the case if you get p-value << 0.05 as you say), then indeed, your last sentence is corrent. – ikonikon Oct 19 '15 at 13:46
Apologies, it should of course read "uniform" not "normal". Thanks – boro141 Oct 19 '15 at 13:55

Interpreting p-values of goodness-of-fit tests using resampling

1 Answers1