I am building a model in which my data set has a high percentage of repeated values.
I am concerned that if I do traditional hold-out or k-fold cross-validaton, that I will get unreliable results as the test sets will have many examples that are basically the same as those included in the train sets.
Is there an approach I can use so that my test sets do not include examples that are basically the same as the examples I have in my training set?