10

I have read in many places such as Stanford's Convolutional neural networks course notes at CS231n (and also here, and here and here), that pooling layer does not have any trainable parameters!
And yet today I was informed by someone that in some paper(here it is) they say and I quote :

S1 layer for sub sampling, contains six feature map, each feature map contains 14 x 14 = 196 neurons. the sub sampling window is 2 x 2 matrix, sub sampling step size is 1, so the S1 layer contains 6 x 196 x (2 x 2 + 1) = 5880 connections. Every feature map in the S1 layer contains a weights and bias, so a total of 12 parameters can be trained in S1 layer .

What is this ?
Can anyone please enlighten me on this?

Hossein
  • 2,005
  • 3
  • 18
  • 32
  • I've never personally heard of a pooling layer that has any trainable parameters but it wouldn't be impossible. Maybe linking to where you read this would help. – Frobot May 04 '16 at 21:29
  • @Frobot: I actually linked the article in the post, its in the parentheses !any way here is the link again : http://arxiv.org/abs/1506.01195 – Hossein May 05 '16 at 06:41
  • 1
    Max-pooling layers have no parameters (you can not "train" them). But average-pooling layers often scale the result and add a bias, so they are "trained". – Ivan Kuckir Nov 28 '19 at 10:59

3 Answers3

13

In the paper you read

a total of 12 parameters can be trained in S1 layer

meant the number of output planes in the pooling layer, not the number of parameters in the weight matrix. Normally, what we train within a neural network model are the parameters in the weight matrix. We don't train parameters in input planes or output planes. So students who wrote the paper didn't express themselves clearly, which made you confused about what a pooling layer really is.

There are no trainable parameters in a max-pooling layer. In the forward pass, it pass maximum value within each rectangle to the next layer. In the backward pass, it propagate error in the next layer to the place where the max value is taken, because that's where the error comes from.


For example, in forward pass, you have a image rectangle:

1 2
3 4

and you would get:

4

in the next layer.

And in backward pass, you have error:

-0.1

then you propagate the error back to where you get it:

0 0 
0 -0.1

because the take the number 4 from that location in the forward pass.

liangfu
  • 311
  • 2
  • 3
3

There is no fixed standard model in the deep learning. This why there are many different CNN models. Sometimes, The pooling can play some learning role as in Here . I have seen many papers where they apply the activation function or add some bias terms to the pooling layer. The pooling can be average pooling, max pooling, L2-norm pooling or even some other functions that reduce the size of the data. The state of art result on CIFAR 10 here used novel pooling method called Fractional Max-Pooling.

Jack Smith
  • 51
  • 1
2

If the pooling operation is average pooling (see Scherer, Müller and Behnke, 2010), then it would be learnable because there is trainable bias term:

takes the average over the inputs, multiplies it with a trainable scalar $\beta$, adds a trainable bias $b$, and passes the result through the non-linearity

But many recent papers mentioned that it has fallen out of favor compared to max pooling, which has been found to work better in practice.

References

Silverfish
  • 20,678
  • 23
  • 92
  • 180
Jack Smith
  • 51
  • 1
  • Max pooling tells if the feature exists whereas average pooling tells if a feature is characteristic to the range. So, average pooling might work well at identifying textures. (?) In any case, max pooling should be less affected by noise - that is it is less affected by what is going on anywhere else when we have a strong detection. Do these thoughts sound about right? – Barney Szabolcs Aug 06 '21 at 17:38