maybe this question will sound a bit as a newbie one but I'd like to have some clarification.
I'm using a VGG16-like convnet, pre-trained with VGG16 weights and edited top layers to work with my classification problem; specifically I removed the three fully connected layers and replaced them with flatten --> fully connected --> dropout(0.5) --> fully connected --> softmax
layers.
Since VGG16 weights were trained using squared images of 224x224 pixels, can I still use the same pre-trained weights but with images of size 320x320, for example?
This is what I read on CS231n Transfer Learning page:
However, some changes are straight-forward: Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides “fit”).
How should I interpret "as long as the strides "fit""?
Furthermore, I made these two test using the same dataset:
1- ConvNet with 320x320 input images fine-tuned for just 100 epochs (because of resource restrictions).
2- Same ConvNet but with 224x224 input images fine-tuned for 250 epochs.
In the first test, I get better classification results instead of the second one, even if it has been fine-tuned less. Are there any reasons to explain this behavior?
Thank you so much!