So I'm trying to fit some binary outcome data to a logistic regression model. Besides the binary outcome I have several different metrics (numeric, integers, as well as factors) associated with each case (and outcome). Now, the idea is as usual to get the best model describing the data without overfitting of course.
I'm using R for this, so just to try it out, and getting the data well organized I use the glm
function. I can use this to create a model using all variables (not a good one), or I can choose which ones I would like to use. But how does one determine which ones should be used ? I know I can use AIC values to see if one is better than another, but I have many metrics I can use, so that would result in a lot of different models to try out. And I don't think that is the way to use AIC.
So yeah, what is the basic approach in situations like this ? Do I run the glm
function on a single variable at a time, and see if that has any significance, and then choose from there, or are there other more effective approaches ?