Gold standard to select predictors for logistic regression

Question

I have a data set with 50 predictors of categorical and numerical variables and 1 dichotomous outcome. I'd like to perform logistic regression, model it and k-fold cross validate it.

However, I have stumbled upon deciding which predictors to include in my model. I have started with the initial hypothesis making, where I try to find some reasonable physical entity. However, my model doesn't produce any good AUC (0.74).

Then I tried stepwise (backward and backward/forward) regression combining both AIC and BIC to let the computer guess which variables better for the outcome. I still can't achieve a better AUC score of 0.75.

Therefore, I would like to enquire if there is gold standard method in such occasion to help me get a grasp of which predictors are best in order to optimize my predictive power of the model.

I use R for my modeling.

You need to read the **extensive** discussions of this topic on this site. You started with a false premise. — Frank Harrell, Dec 10 '16 at 16:07
@ Harrell As someone who follows this site regularly, I've y read various discussions about variable importance etc. I think it would be useful to OP if you pointed him to your favorite. He could then use "Related" to do more research. — meh, Dec 10 '16 at 16:17
http://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection#20856 http://stats.stackexchange.com/questions/24752/52-variables-after-backward-variable-selection-on-logistic-regression-on-160-var http://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016 http://stats.stackexchange.com/search?q=logistic+variable+selection+model-selection+harrell — kjetil b halvorsen, Dec 10 '16 at 16:32
What do you mean when you say that an AUC of 0.74 isn't any good? What are you comparing that to? — Matthew Drury, Dec 10 '16 at 16:37
Can you clarify what you meant by 'try to find some reasonable physical entity'? Did you have a scientific hypothesis about the variables? — mdewey, Dec 10 '16 at 16:55
@FrankHarrell You say the OP starts with a wrong premise. Maybe to be clear you can tell the OP what the false premise is. — Michael R. Chernick, Dec 10 '16 at 19:33
I've never used AUC to judge whether a logistic regression model is any good. Then again, I'm not using logistic regression models for purposes of classification. My point: is this even a classification problem? — The Laconic, Apr 07 '17 at 03:03

score 4 · Accepted Answer · answered Dec 10 '16 at 16:12

4

Not sure about gold standard, but have you looked at regularizarion methods such as LASSO? They are used when one is trying to fit a regression with a large number of predictors - LASSO in particular can double as a variable selection tool. The R packages gamlr and glmnet both should allow you to easily run a cross-validated LASSO with logistic regression.

answered Dec 10 '16 at 16:12

RA334

525
3
4

1

Yes I would agree with RA334.I think LASSO is the gold standard. glmnet can be used to fit ridge regression. See the following references for more detailed info: – Alejandro Ochoa Jan 13 '17 at 16:22
1

1) Regularization and variable selection via the elastic net Zou, Hui, and Trevor Hastie. "Regularization and variable selection via the elastic net." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67.2 (2005): 301-320. 2) The Elements of Statistical Learning (Hastie, Tibshirani, Friedman) Link: http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2005.00503.x/full – Alejandro Ochoa Jan 13 '17 at 17:27

Gold standard to select predictors for logistic regression

1 Answers1