I experiment with CIFA10 datasets. With my model I found that the larger the batch size, the better the model can learn the dataset. From what I see on the internet the typical size is 32 to 128, and my optimal size is 512-1024. Is it ok? Or are there any things which I should take a look at to improve the model. Which indicators should I use to debug it?
P.S. It seems that the gradient is too noisy and if we have a larger sample size, it reduces the noise.