I am working on a predictive model (imbalanced data) and trying to undersample the majority class data. I wanted to get the representative sample of my majority class and somehow came to know about R's RandomForest which has a parameter "sampsize".
Can someone help me know how R's RandomForest subsamples the data? Maybe this can help solve my problem or maybe suggest me some other method?
I've tried getting centroid of the majority class data and undersampled my majority class by eliminating all the samples which are far away from this centroid of majority class but didn't get satisfactory results. I have around 50 features and working in python.