Let's say I have a set P of positive examples and a set N of negative examples.
Prior to feeding this dataset to a SVM for training, should I remove duplicates in those sets?
Intuitively, I don't think that showing several times the same example adds much information. While I understand it could change the weight attached to that example.
A slightly related question: how should I handle examples which are present in both sets?
Because, once objects from the real world are projected into the feature space, there is some information loss, so a p from P and a n from M might have the same coordinates in the feature space (but different labels).
I am thinking about removing such examples from the P set. But, maybe they could be removed from both sets. And, I could also understand that someone with a small P set would just want to remove them from the N set (to not loose any precious positive example).
Are SVMs smart and robust enough to handle such cases automatically?