0

I have a couple of questions acbout Convolutional Neural Networks and I'm struggling to give an answer.

Q1 Let's say I have $[3 \times 32 \times 32]$ image (with three channels) and I apply a first convolutional layer then I can decide the number of output channel for example I want to output a new image $[12 \times 32 \times 32]$. So I take the first image, make the convolution the first time by sliding the filter kernel (let's say it's $3 \times3$) over the pixel matrix and I obtain a new image of dimension $[3 \times30 \times 30]$. Now, on which object the operation should get reiterated to get the final volume of depth $12$? On the original image? On this new outputted $[3 \times 30 \times 30]$ from the convolution?

Q2 I've been reading over and over again that CNNs are equivariant (translations of input translate the output accordingly) and this is accomplished through parameter sharing Now again, where are the weights actually shared? Taking the previous example, I've done a first convolution over the entire RGB image obtaining a $[3 \times 30 \times 30]$ output. I want to produce a volume of $12$ so I do another convolution on the original image.... do I use the same $3 \times 3$ kernel?

James Arten
  • 359
  • 1
  • 8
  • You might find my [drawing](https://stats.stackexchange.com/a/409172/247274) helpful. – Dave Nov 09 '21 at 11:19

0 Answers0