Suppose I have a dataset consisting different fruits:
60 apples, 100 oranges, 120 bananas, 7 grapes, 900 pears,
I want to train a random forest model using these fruits, but what should i do with these large range number? So if I want to train on 80% of the data and test on the rest 20%. There is a high chance that randomly select fruit samples will contain lots of pears and that there may be a bias towards the pears...
what should i do in this case to overcome this problem?