I am working on the scoring model and I aim to predict the probability of default. I have, say m, different candidate Logistic Regression models $M_{1}, \dots, M_{m}$ and I would like to choose the best one for prediction of the probability. Assume, that the data set is moderately large.
My approach is the following:
1) Randomly split the data set into Train and Validation Sets, say in proportion 80/20 without replacement.
2) Train each Logistic Regression model $M_{1}, \dots, M_{m}$ using Train Set and compute Areas Under ROC $AUC_{1}, \dots, AUC_{m}$.
3) Re-split the data again and compute the new $AUC_{1}, \dots, AUC_{m}$.(This is, basically, Monte-Carlo Cross Validation.)
Then, I am thinking to make boxplots for $AUC_{1}, \dots, AUC_{m}$ and choose the model $M_{i}$ which performs "better" according to the boxplots.
Is this correct way? Can I perform the same evaluation, but with Gini index? In my opinion it would make sense, but I haven't seen it in the literature. Also, intuitively I am not satisfied with just one split of the date, because every time we split it we get quite different result.