I'm working with a CSV which contains approximately 220,000 entries. My aim is to predict one of the attributes (ATT1) using the other 3 (ATT2, ATT3, ATT4).
I've been able to do this using NaiveBayes, but now I feel unsatisfied with the result. The reason is that ATT1 can be one of 6 values (VAL1-6), but these are not evenly distributed into the dataset. I'm afraid this could lead to an unprecise prediction.
How do I select a given number of entries for each value of ATT1 from within RapidMiner?