I'm training an Elastic Net model on a small dataset with about 100 TRUE
outcomes and 15 FALSE
outcomes. I've been using AUC
to compare models but I'm worried this metric is unstable because some bootstrapped subsamples only have 4 FALSE
outcomes in the test set. Is there another metric that would be more appropriate here?
Edit: My Elastic Net models returns numerical predictions