I have relatively large (100k items) dataset which I need to split in two groups. So far I've tried knn and the results are not good mainly because I have disproportion in my training data: 90% of points belong to the first group. The same proportion is expected to be in test data.
Is there a way to improve prediction quality with this kind of data? Performance is not important while quality of prediction is paramount.