I am fitting a tree (CART) to the olives-dataset. The training data has 436 observations (test data: 136). I have 3 responses (the 'Region' variable) which splits the training data into 116 / 74 / 246 observations.
If I plot the variables eicosenoic and linoleic, I can see an almost perfect classification.
I used a balanced dataset with 74 observations for each response (btw, is that correct or should I use a smaller size than 74 observations?) and got almost the same prediction results of the testdata as for the unbalanced dataset.
That is why I am wondering if a balanced dataset is required in this case? I assume that balancing is not requried but I am not sure and would like to know other opinions.