I have a dataset of 11 mil. rows with a 1:10 ratio between minority and majority classes.
To train a model, I have selected all the minority class members and 1/3 of the majority class.
The ratio is now 3:10 and the sample data is comprised of 4.33 mil rows
I have fit an XGBoost model on this undersampled data with cross validation and 'ok' result for train test and validation sets (all derived from 4.33 mil rows).
My question now is, should I also train/test the model against the full 11 mil rows or can I proceed with the model I have now?