0

I would like to use the Sequential Sum of Squares test. But the degree of freedom for the denominator is (n - p - 1), where n = number of samples, and p = number of variables in the full model.

What do I do when p > n?

Tyro
  • 141
  • 1
  • 9
  • Since in normal circumstances, as soon as p gets as large as n-1 (and possibly before), there's no remaining residual sum of squares at all. Which implies you're not simply using multiple regression (as your tags suggest). As such you need to *clearly* explain what you're doing. – Glen_b Mar 22 '14 at 08:38
  • I'm using stepwise selection, and trying to use Sequential Sum of Squares at each step to test if adding/subtracting a variable is an improvement or not. – Tyro Mar 22 '14 at 08:50
  • You should not use stepwise selection (see here: [Algorithms for automatic model selection](http://stats.stackexchange.com/a/20856/7290)). You'll need to use LASSO / LARS. – gung - Reinstate Monica Mar 22 '14 at 13:59
  • 1
    Thank you for your advice. I want to compare forward selection against several other methods, including elastic net, on simulated data sets. So I still need to address my original question of how to handle p > n in the Sequential Sum of Squares test. Anyone? – Tyro Mar 22 '14 at 18:06
  • @Tyro (Leaving aside any question of how to do the fitting at all) when $p>n-2$, what residual sum of squares do you have? If it's zero, is it not the case that any additional terms will always have sum of squares =0? What use then, to ask about sums of squares in that situation? Isn't the answer simply 0 every time? – Glen_b Mar 22 '14 at 22:21
  • Glen_b: you're right. Your comment reminded me that I'm trying forward selection, not backward, so my question is moot. Thanks for the kick in the head. – Tyro Mar 23 '14 at 02:44

0 Answers0