multiple comparison issues when comparing nested and non-nested models

Question

ANOVA is typically used to compare nested models. Is it necessary to control for multiple comparisons in this context? I'd say yes, because in practice we're performing multiple F-tests, and the more models we compare, the more likely it will be that we will find a significant difference among two models, even if there are none. Now, since the models are nested, I don't think we can consider the various hypotheses being tested to be independent. Thus I guess corrections such as Holm-Bonferroni, which make no assumption on the dependence of the F-values (or equivalently the p-values), should be used.

For non-nested models, we typically used AIC or BIC to compare models trained on the same data. Again, I think correcting for multiple testing is needed, if we are comparing many models, using methods which do not rely on special dependence structure among the hypotheses.

Is all this correct? In all the textbook examples I've seen where multiple models were compared, I've never seen the issue of multiple comparisons being considered, so I'm wondering if there's something I'm missing.

EDIT consider the following example:

foo <-data.frame(x=c(0.010355057, 0.013228936, 0.016313905, 0.019261687, 0.021710159, 0.023973474, 0.025968176, 0.027767232, 0.029459730, 0.030213807, 0.023582566, 0.008689883, 0.006558429, 0.005144958),
y=c(971.3800, 1025.2271, 1104.1505, 1034.2607, 902.6324, 713.9053, 621.4824, 521.7672, 428.9838, 381.4685, 741.7900, 979.7046, 1065.5245, 1118.0616))
Model1 <- lm(y~poly(x,1), data = foo)
Model2 <- lm(y~poly(x,2), data = foo)
Model3 <- lm(y~poly(x,3), data = foo)
Model4 <- lm(y~poly(x,4), data = foo)
Model5 <- lm(y~poly(x,5), data = foo)

anova(Model1, Model2, Model3, Model4, Model5)
> Analysis of Variance Table

Model 1: y ~ poly(x, 1)
Model 2: y ~ poly(x, 2)
Model 3: y ~ poly(x, 3)
Model 4: y ~ poly(x, 4)
Model 5: y ~ poly(x, 5)
  Res.Df    RSS Df Sum of Sq        F    Pr(>F)    
1     12 191485                                    
2     11  52318  1    139168 146.6072 2.002e-06 ***
3     10  44601  1      7717   8.1295 0.0214384 *  
4      9   7912  1     36689  38.6500 0.0002547 ***
5      8   7594  1       318   0.3349 0.5787128    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Thus, the differences between the degree 1 and degree 2 model, degree 2 and degree 3 model, degree 3 and degree 4 model are all significant, but the difference between the degree 4 and degree 5 model is not significant. I could stop here and choose Model4. However, I did perform 5 significance tests. Shouldn't I correct the alphas for the number of tests?

Note that here I made relatively few significance tests. But in Data Science/Machine Learning applications, you routinely compare thousands of models. Couldn't one model come out better than others, just by chance?

I don't understand why you think alpha error inflation occurs here. Can you explain that in more detail? — Roland, Mar 01 '17 at 15:20
So, your actual problem is not model comparison but model selection. There are much better approaches for the latter. Instead of worrying about p-values look into techniques like the LASSO. See previous answers about model selection. — Roland, Mar 01 '17 at 17:00
@Roland most of the questions I see where AIC or ANOVA are used, talk about model *comparison*: http://stats.stackexchange.com/questions/105003/if-summarizing-stats-from-multiple-models-is-it-meaningful-to-report-a-mean-aic?rq=1,http://stats.stackexchange.com/questions/94718/model-comparison-with-aic-based-on-different-sample-size?rq=1, http://stats.stackexchange.com/questions/100338/comparing-aic-among-models-with-different-amounts-of-data?rq=1 [1/2] — DeltaIV, Mar 01 '17 at 17:21
[2/2] http://stats.stackexchange.com/questions/172398/aic-or-anova-to-compare-models?rq=1, http://stats.stackexchange.com/questions/172398/aic-or-anova-to-compare-models?rq=1. But surely the terminology may be wrong, so let's talk about selection. LASSO is great to select linear models, but these are all nested inside the "omni"-model, where all $\beta_i$ are non-zero. What if I have non-nested, possibly nonlinear, models? I could compute AIC (or Cross-Validation?) for all of them, and select the one with the lowest AIC. Shouldn't I worry about having selected among a multitude of models? — DeltaIV, Mar 01 '17 at 17:30
I'm not so much talking about terminology as about the *purpose* of the exercise. You should not do model selection from nonlinear models. Use of a nonlinear parametric model is best based on scientific theory. If you don't have such a theoretical base, use a nonparametric model such as a GAM. — Roland, Mar 01 '17 at 19:50
@Roland, that is very interesting. Could you please elaborate on that in an answer? I'm a bit surprised that model selection shouldn't be done for nonlinear models - I think selecting among different neural networks architectures or various other nonlinear models is very common in machine learning. But maybe I'm misunderstanding something fundamental. If you could write an answer, that would be great. If you prefer, we could move to chat. If you need me to edit the question, let me know. — DeltaIV, Mar 02 '17 at 10:11
A neural net is not what I'd call a parametric non-linear model. — Roland, Mar 02 '17 at 11:12
@Roland you are right, my mistake: I didn't notice you specifically wrote "nonlinear *parametric* model". — DeltaIV, Mar 02 '17 at 11:27

multiple comparison issues when comparing nested and non-nested models

0 Answers0