F-statistics based method out of fashion?

Question

I'm reading Elements of Statistical Learning and come across this paragraph right before section 3.3.3:

Other more traditional packages base the selection on F -statistics, adding “significant” terms, and dropping “non-significant” terms. These are out of fashion, since they do not take proper account of the multiple testing issues. It is also tempting after a model search to print out a summary of the chosen model, such as in Table 3.2; however, the standard errors are not valid, since they do not account for the search process. The bootstrap (Section 8.2) can be useful in such settings.

I don't quite understand why the methods based on F-statistics are out of fashion. Also, what is the reason that the standard errors calculated are not valid in this case? I thought this was one of the standard examples for F-test.

Alexis · Accepted Answer · 2018-03-06T21:12:56.323

8

There are two issues raised by the quote:

Omnibus tests (like ANOVA/F-tests) raise the issue of multiple comparisons (which is perhaps not so bad, given that we have false discovery rate methods to adjust).
The second issue is the problem of stepwise model building based on progressive inclusion and/or exclusion criteria. This is much more problematic as such models are approximately equally likely to include false predictors as true predictors, inflate $R^{2}$ and other goodness-of-fit measures, deflate p-values, inflate the magnitude of estimated effects, inflate model F-statistics, and, yes, bias standard error estimates, effectively creating a nested series of never stated "this model is conditional on previous decisions regarding these unreported variables" statements. The solution to this issue, is simply do not perform stepwise model building.

I think the author of the quote you cited is discussing F-tests specifically within the context of stepwise model building, but the former need not rely in any way on the latter.

edited Mar 06 '18 at 21:12

answered Mar 03 '18 at 21:46

Alexis

26,219
5
78
131

1

I just digged out your useful post three years ago https://stats.stackexchange.com/questions/124979/investigating-interaction/124993#124993 – James LT Mar 05 '18 at 03:35
1

"the author of the quote you cited is conflating F-tests and stepwise model building". Hmm. I don't have the cited book, but the quote appears to be about stepwise model building "Other more traditional packages base the selection on F -statistics, adding “significant” terms, and dropping “non-significant” terms. " – user20637 Mar 06 '18 at 11:14
@user20637 Good point! I have edited my answer, thank you. – Alexis Mar 06 '18 at 17:07

Gio · Answer 2 · 2021-04-09T14:34:14.660

I think a critic is to the practice of using multiple F-tests in order to select the covariates. Indeed, this practice causes Multiple comparisons problem.

The idea is that the fixed confidence level (say $\alpha$) of each test does not hold (even theoretically) for the overall testing procedure if you perform multiple tests. In particular, the overall confidence level in general will be less than the marginal ones ($\alpha$) and it could cause you to be over confident about the significance of your variables, if you just consider the result of the last performed test ignoring the others.

Here a funnier explanation of the problem!

F-statistics based method out of fashion?

2 Answers2