While training ResNet18 and ResNet50 on an extremely tiny subset of ImageNet, I noticed a curious phenomena where the shallower model performed better. Obviously, the original ResNet paper and other benchmarks as well as conventional wisdom would indicate that deeper models perform better on the entire ImageNet than shallower models, but the key assumption is that it's a gigantic, complex dataset with large amounts of classes.
Is significant reduction in dataset complexity resulting in ResNet50 to overfit and therefore perform slightly worse than ResNet18? Or is there something else that's happening?
And to ask a more general question, what is the relationship between dataset complexity and model complexity?