0

I would like to understand the effect of duplicate training samples on the training of a regression CNN. I read some other threads such as thread 1, thread 2. These threads mostly address the class imbalance problem for classification models.

I want to understand the effect of duplicate training samples when training a regression CNN and in particular connection of duplication of training samples to overfitting.

I'm currently dealing with a situation:

  • I'm training a CNN which predicts pixels along the traffic lanes.
  • I have a dataset with 15000 samples in the training dataset and 7000/15000 samples are from a stationary scene. i.e. almost 50% of the samples from the training set are near identical to each other.

Hence, I would like to ask,

  1. What is the impact of duplicate samples on the training of regression CNN??
  2. Is there any connection between overfitting and having so many duplicate samples in training ??

Thanks, mvish

Vishal
  • 1

0 Answers0