I would like to understand the effect of duplicate training samples on the training of a regression CNN. I read some other threads such as thread 1, thread 2. These threads mostly address the class imbalance problem for classification models.
I want to understand the effect of duplicate training samples when training a regression CNN and in particular connection of duplication of training samples to overfitting.
I'm currently dealing with a situation:
- I'm training a CNN which predicts pixels along the traffic lanes.
- I have a dataset with 15000 samples in the training dataset and 7000/15000 samples are from a stationary scene. i.e. almost 50% of the samples from the training set are near identical to each other.
Hence, I would like to ask,
- What is the impact of duplicate samples on the training of regression CNN??
- Is there any connection between overfitting and having so many duplicate samples in training ??
Thanks, mvish