I'm using k-fold cross-validation to compare different models.
I splitted my dataset in 6 chunks and used 4 random chunks as training set and the remaining 2 as a test set.
Now I fitted n-different models to the training set and calculated the RMSE on both the training and the test sets. From what I understand, the model having the lower RMSE in the test set should be the preferable one.
For the sake of clarity for I mean: RMSE = sqrt( (fitted-observed)^2/ n.observations )
The models differ one another for some indipendent variables which have differenent amounts of NA values (in particular since some variables represent the cumulative effect of others I have that the number of NAs increases the more variables I cumulate).
So I find myself comparing a first model with say n NAs with a second one having 10n NAs. In this way I'm comparing models that are fitted to a different number of observations.
1) Is this an issue when comparing RMSE calculated on the test set?
I know for example that, if I was comparing models on the training set, the AIC would not be meaningful in this case, less sure for the R-squared...
2) since I run each model 10 times on 10 training sets and tested on 10 test sets (see beginning for explanations), for a given model I have average RMSE and its standard error on both training and test sets. How should I interpret differences between the training and test RMSE?
Any suggestion appreciated!