I'm reading Elements of Statistical Learning and come across this paragraph right before section 3.3.3:
Other more traditional packages base the selection on F -statistics, adding “significant” terms, and dropping “non-significant” terms. These are out of fashion, since they do not take proper account of the multiple testing issues. It is also tempting after a model search to print out a summary of the chosen model, such as in Table 3.2; however, the standard errors are not valid, since they do not account for the search process. The bootstrap (Section 8.2) can be useful in such settings.
I don't quite understand why the methods based on F-statistics are out of fashion. Also, what is the reason that the standard errors calculated are not valid in this case? I thought this was one of the standard examples for F-test.