1

Is cross validation an appropriate technique for variable selection and regression coefficient shrinkage?

A former colleague of mine used 10-fold CV to compare the regression coefficients from the 10 training models. Variables were kicked out if the coefficients changed sign. The average coefficient values for the remaining variables was then used to estimate the coefficients of the final model.

Does this homebrewed methodology sound like a valid technique?

chl
  • 50,972
  • 18
  • 205
  • 364
RobertF
  • 4,380
  • 6
  • 29
  • 46
  • 4
    You may want to take a look at the [elastic net](http://en.wikipedia.org/wiki/Elastic_net_regularization) method (see also this [detailed handout](http://www.stanford.edu/~hastie/TALKS/enet_talk.pdf) or Tibshirani and coll [JSS paper](http://www.jstatsoft.org/v33/i01/) for one implementation in R). There's also [When to use regularization methods for regression?](http://stats.stackexchange.com/q/4272/930) among our related discussions on this site. – chl Nov 07 '13 at 17:52
  • Great, thank you chl, I agree elastic net seems more appropriate in this case. – RobertF Nov 07 '13 at 20:16
  • 3
    I asked a very related question earlier this year (see: http://stats.stackexchange.com/questions/52274/choosing-a-predictive-model-after-k-fold-cross-validation). The answers from that thread suggest that the best approach is to use the 10-fold CV to choose the size of the regularization penalty (i.e. the C_1,C_2 in the elastic net), and then train your model using this penalty on *all* the data. – Berk U. Nov 07 '13 at 23:59

0 Answers0