0

I have a dataset with a binary outcome, y.

I'd like to run a decision tree on the data, but y == T is very rare, and so every leaf of the tree predicts y == F.

Is there any problem with sampling based on y, e.g., just using the N cases where y ==T and N random cases where y == F?

Jeremy
  • 1,259
  • 3
  • 12
  • 17

0 Answers0