3

On this site it has been confirmed multiple times that, contrary to what is often heard, hypothesis tests don't have any issues with large sample sizes. As a matter of fact, the probability of Type 1 errors when the null hypothesis is true, doesn't depend on the sample size (see for example here). However, people are often taught that to perform some inference procedures (ANOVA, inference for linear regression, etc.), they need to first check the validity of the underlying assumptions (for example, errors are normally distributed, etc.) using an hypothesis tests (for example, a normality test on the residuals of the linear regression). The unlucky disciple tests for normality on a 10^7 sample, finds that the test rejects the null, and she/he falls into despair. I think this is the point which generates confusion.

How would you assess the validity of the assumptions behind the inference procedure, without an hypothesis test? If this question is too general, let's just consider the two cases I cited (ANOVA and inference for the coefficients of a linear regression model). I've been advised to make Q-Q plots. They are great, but in some cases their interpretation can be a bit subjective...I'd rather look for a tool that let me estimate by how much to inflate the C.I.s for the $\hat{\beta}_i$ if residuals "don't look normal"...bootstrap, maybe?

Also, I have another doubt: if at large sample sizes we say that an hypothesis test is not the right tool to check the validity of the assumptions underlying a certain inference procedure, but we also say that sample size doesn't affect the reliability of NHST, then this would mean that hypothesis tests are never (no matter what the size of the sample be) appropriate tools to verify inference assumptions...is that correct?

DeltaIV
  • 15,894
  • 4
  • 62
  • 104
  • 2
    One way to understand the "unlucky disciple" is to recognize that many tests do not require the distribution to be Normal (or whatever); they only require that the *sample distribution of the test statistic* closely follow the assumed distribution. In an ANOVA of a sample of size $10^7$, for instance, unless residual skewness is enormous, the $F$ statistic will follow an $F$ distribution closely enough that we can trust the p-values it reports to us. The point is that the disciple who applies a normality test to data is not evaluating a relevant assumption. – whuber Jun 06 '16 at 23:00
  • @whuber, if I'm understanding you correctly, you're saying that we only require that the residuals be normally distributed, because that would make the $F$ statistic (the ANOVA test statistic) to have an $F$ distribution even for finite samples. But as the sample size increases, even if residuals are not really normally distributed, the $F$ statistic is bound to be very closely $F$-distributed, thus we actually don't need to perform a normality test on the residuals. [Continued] – DeltaIV Jun 07 '16 at 11:44
  • [Continued] In the same vein, for a one sample t-test we require that the sample is drawn from a Normal population just because in that case the distribution of the test statistic is precisely the Student distribution, even for finite samples. But as the sample size increases, the sample mean becomes more and more normally distributed, even if the original population is not normal, and thus again we don't have to worry about testing the sample for normality. – DeltaIV Jun 07 '16 at 11:48
  • 2
    ps "unlucky discipline" wasn't meant in a derogative way. It's just that "unlucky student" sounded a bit weird to me, when referred to people that graduated long ago, and "unlucky employee who is expected to blindly apply Six Sigma best practices" was too long :) probably "unlucky practitioner" would have been the right expression in English. Anyway, to summarize, in both cases if we really wanted to test something, we should that the test statistic has the expected distribution, not the data. – DeltaIV Jun 07 '16 at 11:50
  • 2
    That is a very good summary of the situation. It has even been claimed that in these senses the t-test (and to some degree) ANOVA are "robust" or even "nonparametric" methods. What the practitioner most needs is the ability to use the data distribution to assess how close to Normal (or $\chi^2$ or $F$ or whatever) the *test statistic* is likely to be. That requires a good mathematical understanding of how the statistic is constructed and of what characteristics of the data distribution might cause the distribution of the statistic to deviate from what is expected. – whuber Jun 07 '16 at 14:04
  • @whuber, a good mathematical understanding, or, if at least the expression of the test statistic as a function of the data sample is known, the bootstrap :) Since we are dealing with large data sets, it should work expecially well. Surely, knowledge of mathematical statistics theory is superior, but not all practitioners are versed in that. Even the knowledge of the test statistic function cannot be taken for granted. Sometimes performing a test means "clicking button A in MiniTab". Of course such a situation is dangerous, at best. – DeltaIV Jun 13 '16 at 09:32

0 Answers0