I'm having a go at coding a logistic regression model building algorithm and I'd appreciate some advice. I've read in several places (including here) that minimizing both AIC and BIC could be an effective strategy for model evaluation.
My algorithm employs the following (undoubtedly simplistic) approach:
Fit a model with the intercept only, establish AIC and BIC for null model.
Iterate through the list of features, creating test models that consist of each feature plus the intercept.
Evaluate the test models, selecting that model which minimizes both AIC and BIC scores.
Set AIC and BIC threshold to that of current best model values.
Remove the feature selected in current round.
Repeat, iterating through remaining features, creating new sets of test models for comparison (including previously selected features plus intercept), until no further reduction in AIC and BIC values occur, return the model with current lowest scores for AIC and BIC.
I'm sure this is a hopelessly naive approach, and I'd appreciate some feedback. I suppose my basic question is whether I'm completely misusing these criteria. Thus far, it returns a parsimonious model with very low p-values for all features. Examining the features in depth, it certainly provides some useful insights into the data I'm working with. It's definitely narrowed the field, and given that my data set has 40 features (the final model contained only 6), it's obviously preferable to guessing!