0

I am working on the scoring model and I aim to predict the probability of default. I have, say m, different candidate Logistic Regression models $M_{1}, \dots, M_{m}$ and I would like to choose the best one for prediction of the probability. Assume, that the data set is moderately large.

My approach is the following:

1) Randomly split the data set into Train and Validation Sets, say in proportion 80/20 without replacement.

2) Train each Logistic Regression model $M_{1}, \dots, M_{m}$ using Train Set and compute Areas Under ROC $AUC_{1}, \dots, AUC_{m}$.

3) Re-split the data again and compute the new $AUC_{1}, \dots, AUC_{m}$.(This is, basically, Monte-Carlo Cross Validation.)

Then, I am thinking to make boxplots for $AUC_{1}, \dots, AUC_{m}$ and choose the model $M_{i}$ which performs "better" according to the boxplots.

Is this correct way? Can I perform the same evaluation, but with Gini index? In my opinion it would make sense, but I haven't seen it in the literature. Also, intuitively I am not satisfied with just one split of the date, because every time we split it we get quite different result.

KimMik
  • 53
  • 4
  • 1
    You could use a scoring rule such as the [Brier Score](https://stackoverflow.com/questions/25149023/how-to-find-the-brier-score-of-a-logistic-regression-model-in-r) – Robert Long Apr 09 '19 at 14:58
  • @Robert Long, does the approach that I proposed make sense? Should I compute Brier Score on Validation Set only? – KimMik Apr 09 '19 at 15:39
  • 1
    No, you would use the Brier Score (a proper scoring rule) in place of AUC-ROC (a semi-proper scoring rule). See [here](https://stats.stackexchange.com/questions/339919/what-does-it-mean-that-auc-is-a-semi-proper-scoring-rule) for more detail – Robert Long Apr 09 '19 at 15:57
  • @Robert Long Sorry, I am confused. Don't we compute AUC-ROC based on the Validation set? – KimMik Apr 09 '19 at 16:09
  • How many observations do you have? Data splitting is only advisable when $n$ is very large, see also https://stats.stackexchange.com/questions/66457/how-to-do-external-validation-of-regression-models – kjetil b halvorsen Apr 10 '19 at 11:10

0 Answers0