I have been trying to classify a set of data into one of four classes. The data has already been generated and I have set aside 10,000 for training and 2,000 for testing. I have also generated the labels for each of the data. Let's call the classes - 0,1,2 and 3.
Now when I observe the classification, I notice that there are a lot of 0s in the training data and hence in most cases, the classifier is just learning to predict 0 no matter what the features are. (I am using random forests for classification)
Generating the data again to ensure uniformity, takes a lot of time and I prefer to avoid that. Is there anyway I can still use the data that I have?