My original sample has 350 observations drawn randomly from a population of 60,000 people.
My independent variable is Default
, with 35 observations with value of 1, and the rest with value of 0.
I split my sample to Train (60%) and Test (40%), then use multi-variate logistic regression to predict Default
on Train, and validate on Test.
Because of small sample size, I want to validate confidence interval of a performance statistics (ROC) using bootstrap in two ways:
(1) Create 10,000 samples with the same size of the original sample, split them into train and test, and fit the model on train, and calculate ROC on test for each of the samples. Finally, plot the distribution of ROC.
However, when doing this, my ROC values are very disperse with high variance.
(2) Create 10,000 samples, but each sample has 3000 observations (randomly drawn from the original 350 observations), then repeat the process. In this case, my ROC and GINI values become very concentrated. This is what I prefer.
My question is:
In theory, what is the difference between the two methods?
Since (2) will most likely produce a better result, why (1) is much more popular?