I am working on a binary classification model (in R, if that matters). I have done some research and reading on how to handle a class imbalance in the dependent variable, with different sampling methods.
I am curious how I could apply this similar methodology to to a variable with more than 2 classes, in the same model. As an example I have an independent variable, language, with 4 classes:
English - Count: 50035 Perc: 94.02 Spanish - Count: 1588 Perc: 2.98 Not Collected - Count: 1490 Perc: 2.8 Other - Count: 102 Perc: 0.19
I am comparing the outcomes of three models, stepwise logistic regression, random forest and xgboost. I believe I need to be researching "weighted classes". Just wanted to make sure I am on the right path.
Thank you!