3

I have a response variable that can be $A,B,C$. It is very sparse, meaning 99% of the sample is $B$ and the rest is approximately evenly divided between $A$ and $C$.

How do I predict this variable in a random classification forest? I am looking for guidelines:

  • Can I use the standard classification splitting criterion with such a sparse response variable?
  • Given the asymmetric damage an out of sample misclassification would do (i.e. classifying A or C correctly is most important and B correctly is a lower priority), how do I apply some kind of asymmetric loss function here?
  • Are there other special things I need to take into consideration when modelling such a sparse response variable?

Related but not duplicated: Is there a Random Forest implementation that works well with very sparse data?

Jase
  • 1,904
  • 3
  • 20
  • 33

0 Answers0