3

Suppose I have a Cramér-von Mises $p = 0.99$ and Chi-squared $p = 0.88$ of a distribution being Student's-$t$. What can I say?

What can I say about the underlying distribution's being a Student's-$t$ type? (BTW, $n=412$) That is that it

  • a) may be
  • b) is likely to be
  • c) is very likely
  • d) very likely to be under Cramér-von Mises assumptions and not unlikely under Chi-squared?

In other words, can I say something more than "Student's-$t$ cannot be ruled out." Somehow, that latter doesn't satisfy, and if I had a result of $p = 0.99$ from flipping coins for some hypothesis, I would not be asking. What spoils the broth is that the high probabilities for some of these tests likely are not as well characterized as the low ends. For example, I know from experience that it is hard to get really high probabilities from a Chi-squared test. Anyone looked at this?Useless information?


Now that I think about it, a bit more, I think that the correct method of quantifying how useful a distribution is would be to render it into an error image and calculate what the noise and image misregistration are between the theoretical distribution and estimated distribution. The result would then be image misregistration as measured in the error units and noise in those same units, and the noise should then agree with that amount predicted stochastically. Still, and even though I voted for the answer I got, I am not entirely satisfied by it. If I get $p_1>p_2$ for two models and the same test, I am not sure that means nothing. For example, see link

Carl
  • 11,532
  • 7
  • 45
  • 102

1 Answers1

7

It sounds like you've fallen prey to a common misconception about $p$-values. $p$ does not tell you the probability that the null hypothesis is true. Rather, it is the probability, assuming the null hypothesis is true, of observing a test statistic as extreme as the one you've observed. (And when the null hypothesis is false, it has no clear meaning at all.) So, a high $p$ doesn't tell you anything, except maybe something really weak like "you don't have much reason to believe that the null hypothesis is false according to the criterion this test happens to consider".

The upshot is that the results of the two tests you conducted here don't have a useful interpretation. The tests simply failed to give you any useful information.

Kodiologist
  • 19,063
  • 2
  • 36
  • 68
  • So, although we would maximize those probabilities to find a fit, and the fit looks good, and the counter thesis that this is not high fidelity representation of a tendency in the data seems rather implausible, we get no information from them and I have fallen prey to some abstract argument that frankly seems farfetched. OK, you sure about that? – Carl Aug 26 '16 at 00:43
  • "So, although we would maximize those probabilities [by which you presumably mean $p$-values] to find a fit…" — No, that would be a pretty bizarre way to try to fit a model. Some metrics that are commonly used for model-fitting are squared error and likelihood. – Kodiologist Aug 26 '16 at 00:47
  • Nice answer (+1), this is a similar question: http://stats.stackexchange.com/questions/166323/misunderstanding-a-p-value/166327#166327 –  Aug 26 '16 at 10:32
  • Since [confidence intervals](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2850991/) are apparently more useful that $p$-values, can we switch to using them and do equivalent testing for example for Chi-squared? – Carl Aug 26 '16 at 21:25
  • @Carl I like confidence intervals better than $p$-values, too, but I don't see how they would help you here. – Kodiologist Aug 26 '16 at 23:57
  • Well, if we know the noise function, which we would for example for Poisson counting statistics of nuclear decay, we can make an error image as above, and then calculate confidence intervals from that, no? – Carl Aug 27 '16 at 02:11
  • 1
    @Carl I see neither the appeal of these "error images" over standard model-selection and model-validation techniques, nor what quantity you would calculate a confidence interval for. – Kodiologist Aug 27 '16 at 04:09
  • Model selection and validation are subsets of regression, and the world is a larger place than that. What I suggested is just another trick in the bag of tricks. This relates, for example, to image fusion and Poisson counting statistics from nuclear decay. I proposed a physically related measurement system anchored in the real world. AIC, for example, uses a restricted choice for noise treatment, and whose image is on an arbitrary scale and not fusable between data sets. So when we speak of "standard" techniques and do not mention B-Splines, derivative fitting etc., we have left out much. – Carl Aug 27 '16 at 18:05
  • "Model selection and validation are subsets of regression" — Not at all. They are issues that apply to all models, as the names suggest. – Kodiologist Aug 27 '16 at 18:08
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/44574/discussion-between-carl-and-kodiologist). – Carl Aug 27 '16 at 18:35
  • 1
    Actually optimizing a goodness of fit criterion to estimate parameters of a model isn't odd or even particularly uncommon. (Arguably estimation via ML is a version of it as well) – Glen_b Oct 02 '16 at 02:08
  • @Glen_b I agree. I was talking about optimizing $p$-values, not optimizing a goodness-of-fit metric itself. – Kodiologist Oct 02 '16 at 05:41
  • @Kodiologist For a given model class isn't that generally the same thing? – Glen_b Oct 02 '16 at 06:31
  • @Glen_b It should lead to the same minimum, yes, so long as there's a monotonic relationship between the goodness-of-fit value and $p$. But using $p$ instead of the goodness-of-fit value doesn't buy you anything. – Kodiologist Oct 02 '16 at 14:20
  • @Kodiologist Using *p* for a parameter unrelated to goodness-of-fit would get you somewhere. Goodness of fit is overused, for one example among many, AIC would be more useful for model parameter value extraction and BIC for goodness-of-fit. – Carl Jan 30 '17 at 03:46
  • It frightens me when people know JUST ENOUGH statistics to wreak havoc. – bdeonovic Feb 08 '17 at 02:45
  • @bdeonovic I am not done with this. And I am much worse, I know nothing, which is why I test everything. – Carl Mar 24 '17 at 05:10