GoogLeNet structure

Question

I have 2 questions about GoogLeNet architecture, appreciate any help:

Even after reading this, I still don't understand where is the parameters reduction in the second convolutional layer:

Specifically, if the number of input/output channels to/from the 1X1 conv layer was 64 as described in Table 1? aren't we end up with more parameters than if we had only the 3X3 layer?
I don't understand the number of parameters computation as shown in the column "params" in Table 1, for example it reads 2.7K params for the first layer but if we have 3 input channels (RGB) and 64 output channels with 7X7 filters, isn't it suppose to be 3 * 64 * 7 * 7 + 64 = 9472 paramaters?

Hi and welcome to CV! Please avoid asking multiple questions in a single thread. — Jan Kukacka, Nov 22 '19 at 09:24

score 1 · Accepted Answer · answered Nov 22 '19 at 15:05

I cant see any parameter reduction in the second conv layer in the googlenet paper. There is no 1x1 conv layer and it makes no sense to have the same output dimension as input dimension. You wouldnt do: 64 filters -> 1x1 conv -> 64 filters. You use the 1x1 conv layers to reduce the amount of feature maps.

This is done after the 2nd layer. Let's take the 3x3 part of the inception 3a module for example. The input is 28x28x192 and gets reduced with a 1x1 filter to 28x28x96 and then 128 3x3 filters are used to create 28x28x128. This 3x3 path has 192*1*1*96 + 96*3*3*128 = 129024 parameters. Without the bottleneck 1x1 layer you would have 192*3*3*128 = 221184 parameters.

The 2.7k parameters are confusing, and your equation is correct if you use a standard CNN layer. Here is the same question at stackoverflow. It is either wrong or they used some sort of asymetric layer.

GoogLeNet structure

1 Answers1