I am running a decision tree and to balance the class labels I used SMOTE. The dataset originally consisted of 350k records and after the balancing is 1.400k records, and the resultant decision tree has 10 terminal nodes, so it has 10 decision rules for such terminal nodes.
The problem arises when I apply such 10 rules to the 350K original records, because one of those rules do not match the conditions of the imbalanced dataset (350k records). In other words the "problematic" decision tree rule was built entirely of synthetic records which can be applied to the balanced dataset (1.400k records), but not on the imbalanced dataset. So, I am calling this a “synthetic” rule
So, my question is if I am doing something wrong or it is expected to have a synthetic” rule?
Best Regards