I am running a simple experiment. For simplicity, say I have 100k pictures of cats, with ground truth segmentation of their ears, and the network needs to correctly predict the segmentation. I split the dataset to train/validation 80:20 in a completely arbitrary way, after shuffling and there's no reason to believe the samples in either of the two are harder/easier. At the end of Epoch 1 (!) I already see the error curves diverging and the result on validation looking much worse. This cannot be overfitting, right? What else could it be? I can only think of the batchnorms. Any other ideas?
Asked
Active
Viewed 13 times