0

I am working on a binary classification model (in R, if that matters). I have done some research and reading on how to handle a class imbalance in the dependent variable, with different sampling methods.

I am curious how I could apply this similar methodology to to a variable with more than 2 classes, in the same model. As an example I have an independent variable, language, with 4 classes:

English - Count: 50035 Perc: 94.02 Spanish - Count: 1588 Perc: 2.98 Not Collected - Count: 1490 Perc: 2.8 Other - Count: 102 Perc: 0.19

I am comparing the outcomes of three models, stepwise logistic regression, random forest and xgboost. I believe I need to be researching "weighted classes". Just wanted to make sure I am on the right path.

Thank you!

  • Unbalanced classes are almost certainly not a problem, and oversampling will not solve a non-problem: [Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?](https://stats.stackexchange.com/q/357466/1352) – Stephan Kolassa Mar 07 '21 at 06:29

0 Answers0