I'm interested in building a set of candidate models in R for an analysis using logistic regression. Once I build the set of candidate models and evaluate their fit to the data using AICc (aicc = dredge(results, eval=TRUE, rank="AICc")
), I would like to use k-fold cross fold validation to evaluate the predictability of the final model chosen from the analysis. I have a few questions associated to k-fold cross validation:
I assume you use your entire data set for initially building your candidate set of models. For example, say I have 20,000 data values, wouldn't I first build my candidate set of models based on the entire 20,000 data values? Then do use AIC to rank the models and select the most parsimonious model?
After you select the final model (or model averaged model), would you then conduct a k-fold cross validation to evaluate the predictability of the model?
What is the easiest way to code a k-fold cross-validation in R?
Does the k-fold cross validation code break up your entire data set (e.g., 20,000 data values) into training and validation sets automatically? Or do you have to subset the data manually?