Cross validation before or after stepwise modeling

Asked Apr 22 '15 at 08:42

Active Apr 22 '15 at 10:59

Viewed 84 times

I have a dataset of 1931 observations and I intend to predict a binary outcome out of that. There is a list of 128 predictors (both binary and continuous). First I ran logistic regression modeling using all predictors and got a highly significant model (AUC = 0.84). Assuming that the high value of AUC was due to overfitting the model by using too many predictors, I did stepwise modeling to find the effective predictors:

mylogit <- glm(outcome ~ . , data = temp,family="binomial")
step <- step(mylogit, direction="both")

Now, I am not sure whether should I have done cross validation before or after stepwise modeling.

edited Apr 22 '15 at 10:59

Scortchi - Reinstate Monica

27,560
8
81
248

asked Apr 22 '15 at 08:42

user30314

1

See also [here](http://stats.stackexchange.com/questions/64991/) & [here](http://stats.stackexchange.com/questions/5918/). Any kind of outcome-based model selection has to be repeated as part of each training fold to get a fair estimate of the out-of sample performance of the whole procedure. – Scortchi - Reinstate Monica Apr 22 '15 at 08:53

Cross validation before or after stepwise modeling

0 Answers0