Change image input size of a pre-trained convnet

Question

maybe this question will sound a bit as a newbie one but I'd like to have some clarification.
I'm using a VGG16-like convnet, pre-trained with VGG16 weights and edited top layers to work with my classification problem; specifically I removed the three fully connected layers and replaced them with flatten --> fully connected --> dropout(0.5) --> fully connected --> softmax layers.

Since VGG16 weights were trained using squared images of 224x224 pixels, can I still use the same pre-trained weights but with images of size 320x320, for example?

This is what I read on CS231n Transfer Learning page:

However, some changes are straight-forward: Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides “fit”).

How should I interpret "as long as the strides "fit""?

Furthermore, I made these two test using the same dataset:
1- ConvNet with 320x320 input images fine-tuned for just 100 epochs (because of resource restrictions).
2- Same ConvNet but with 224x224 input images fine-tuned for 250 epochs.

In the first test, I get better classification results instead of the second one, even if it has been fine-tuned less. Are there any reasons to explain this behavior?
Thank you so much!

I'm a bit confused about the tests you did. When you say fine-tuned, do you mean just trained? Fine-tuning suggests you used the pretrained weights, which would mean you've solved your own question, basically. — cangrejo, Nov 04 '17 at 11:25
No, I mean exactly fine-tuning and I know this solves my answer... but I was looking a better explanation about why this works, even a theoretical one :) — matteodv, Nov 04 '17 at 11:33
To use pretrained VGG network with different input image size you have to retrain top dense layers, since after flattening the output vector from convolutions will have different dimension, obviously. However, there are so-called fully convolutional architectures, like Resnet, Inception, etc, that you can use out-of-the-box with any image input size that does not diminish inside the network (i.e. after some poolings and convs it will still have at least 1x1 spatial size) — Łukasz Grad, Nov 04 '17 at 11:43
Thank you Lukasz for your comment! I didn’t know about fully convolutional architectures, I will look to them... For the VGG net, yes, I retrain the top dense layers but I also needed to change conv and pooling layer shapes since Keras complained about shape mismatch. Would this affect the pretrained weights I’m using? — matteodv, Nov 04 '17 at 11:58
@matteodv Take into account that the pretrained weights of the convolutional layers of VGG are just the filters. Therefore, you can change the input dimensions of the layers and said weights will be unaffected. That is related to the "stride fits" issue. You just need to make sure that your input shapes are an adequate "factor" of your filter size and stride (just think if it would be possible to go over the input shape). If they aren't, that is easily fixed by removing a few pixels or padding with zeros. — cangrejo, Nov 04 '17 at 12:01
Oh great, thanks! It should be fine then since I’m using the default VGG16 filters and stride ;) — matteodv, Nov 04 '17 at 12:10

Change image input size of a pre-trained convnet

0 Answers0