Anderson-Darling test: p-value reliability when testing fitted distribution

Question

I have a question regarding the A-D test, and perhaps goodness-of-fit tests altogether. I fitted a dataset to a long list of distributions. According to A-D, a Wakeby distribution provides the closest fit. In Mathematica, I specified the Wakeby distribution with the fitted parameters, and then ran all of the available goodness-of-fit tests. This gave me a table of p-values:

$$\begin{array}{c|c|} & \text{p-value} \\ \hline \text{Anderson-Darling} & 0.98531 \\ \hline \text{Cramér-von Mises} & 0.98686 \\ \hline \text{Kolmogorov-Smirnov} & 0.98496 \\ \hline \text{Kuiper} & 0.98672 \\ \hline \text{Pearson chi-squared} & 0.99854 \\ \hline \text{Watson U-squared} & 0.99670 \\ \hline \end{array}$$

These are very high p-values across the board, which suggests that the data fits the distribution extremely closely. However, I've done some reading, and I understand that there are some pitfalls when using the A-D test if the distribution parameters are estimated from the data. What I don't understand is why, how to avoid them, and whether they apply to all of the other tests mentioned in the table.

I saw this question, where 2 different methods of doing the A-D test in R gave vastly different p-values. One of the answers states that the nortest result is correct, and goftest isn't compensating for the fact that the parameters were deduced from the data. I ran the same test from that question in Mathematica to see which result I received, and it gave me the same p-value as goftest. Could it be that the p-values I'm getting from my data flawed in some way? Am I misusing the test?

Also, the Mathematica guide for Anderson-Darling, under "Possible Issues", suggests that using fitted distributions can cause a problem, and that one solution is to use a Monte Carlo simulation with the test. I did so with 100,000 samples, which gave me a p-value of 0.98592, very close to the original value. My assumption then is that the p-value is reliable. Am I correct to assume that?

In general, what is the problem with using a goodness-of-fit test when the distribution parameters are fitted to the data, and how can I avoid it?

"These are very high p-values across the board, which suggests that the data fits the distribution extremely closely." Not quite---p-values do not let you confirm a null hypothesis. — Dave, Sep 10 '20 at 18:28
Right, but they give an indication. With these tests, the null hypothesis is that the data comes from the distribution. But I suppose we could invert the hypotheses, in which case we would reject the null hypothesis (that the data doesn't come from the distribution) even at a 2% significance level. The probability that we have ascribed the incorrect distribution is <2%, which is fairly low. Is my reasoning correct? — Stefan, Sep 10 '20 at 18:56

Anderson-Darling test: p-value reliability when testing fitted distribution

0 Answers0