if you have a standard CNN architecture with convolutional layers there are 2 reasons why the identity of the skip connection can't be added with the current output.
1) There was pooling between the identity and the current output
2) The amount of filters increased between the identity and the current output.
If there was pooling you average-pool the identity. If the amount of filters increased you pad with empty featuremaps.
My question is in regard to transposed convolutional layers (sometimes called deconvolutional layers). Each layer increase the resolution and very often decreases the amount of filters
How would one implement skip connections here?
You would have to upsample the identity (interpolation or spacing out and filling with zeroes). You would also have to decrease the amount of filters of the identity. This can be done by slicing or by using "bottleneck layers" (see InceptionNet). I assume that using bottleneck layers might hurt the gradient flow.
The only thing I could find where papers where the authors used skip connections from convolutional layers to their corresponding transposed convolutional layer but this is not what I want to do.