I have a dataset with patients and their relevant comorbidity/demographics along with preoperative characteristics. The idea is to find the predictive factors that would cause some to be discharged to either rehab or their home following a certain surgery.
The dataset, like most biomedical cases, is severely imbalanced. Most people are discharged to home. So, I performed under sampling in order to get a 1:1 ratio of rehab to home and then trained a model.
The model was built using PyTorch. I developed a feed forward neural network and trained it on the under sampled dataset.
The model had acceptable accuracy, precision, and F1 scores, with values above 0.70. The AUC score was about 0.65 and the Brier score was about 0.29.
But, when I take this model and run it on my original dataset, it performs poorly. So, my question is, what’s the point of under sampling in the first place if our result is a model that performs poorly on the original dataset?
Also, what are alternative methods towards solving a situation with an imbalanced dataset?