In CNN, are upsampling and transpose convolution the same?

Question

Both the terms "upsampling" and "transpose convolution" are used when you are doing "deconvolution" (<-- not a good term, but let me use it here). Originally, I thought that they mean the same thing, but it seems to me that they are different after I read these articles. can anyone please clarify?

Transpose convolution: looks like we can use it when we propoagate the loss via convolutonal neural network.

http://andrew.gibiansky.com/blog/machine-learning/convolutional-neural-networks/#Backward-Propagation

https://github.com/vdumoulin/conv_arithmetic

https://arxiv.org/pdf/1312.6034v2.pdf, section 4 "For the convolutional layer..."
Upsampling: seems like we use it when we want to upsample from smaller input to larger input in convnet-decovnet structure.

https://www.youtube.com/watch?v=ByjaPdWXKJ4&feature=youtu.be&t=22m

Cross-posted at http://datascience.stackexchange.com/questions/15863/in-cnn-does-upsampling-and-transpose-convolution-the-same. Please don't cross-post but decide which site you want your question on. — Scortchi - Reinstate Monica, Dec 22 '16 at 08:09
@Scortchi Oh they are linked!? Sorry, I remove the one in data science. — RockTheStar, Dec 22 '16 at 18:44
A interesting [post](https://distill.pub/2016/deconv-checkerboard/) on so-called deconvolutions — hans, Jan 11 '19 at 20:00

score 13 · Answer 1 · answered Mar 06 '18 at 19:18

Since there is no detailed and marked answer, I'll try my best.

Let's first understand where the motivation for such layers come from: e.g. a convolutional autoencoder. You can use a convolutional autoencoder to extract featuers of images while training the autoencoder to reconstruct the original image. (It is an unsupervised method.)

Such an autoencoder has two parts: The encoder that extracts the features from the image and the decoder that reconstructs the original image from these features. The architecture of the encoder and decoder are usually mirrored.

In a convolutional autoencoder, the encoder works with convolution and pooling layers. I assume that you know how these work. The decoder tries to mirror the encoder but instead of "making everything smaller" it has the goal of "making everything bigger" to match the original size of the image.

The opposite of the convolutional layers are the transposed convolution layers (also known as deconvolution, but correctly mathematically speaking this is something different). They work with filters, kernels, strides just as the convolution layers but instead of mapping from e.g. 3x3 input pixels to 1 output they map from 1 input pixel to 3x3 pixels. Of course, also backpropagation works a little bit different.

The opposite of the pooling layers are the upsampling layers which in their purest form only resize the image (or copy the pixel as many times as needed). A more advanced technique is unpooling which resverts maxpooling by remembering the location of the maxima in the maxpooling layers and in the unpooling layers copy the value to exactly this location. To quote from this (https://arxiv.org/pdf/1311.2901v3.pdf) paper:

In the convnet, the max pooling operation is non-invertible, however we can obtain an approximate inverse by recording the locations of the maxima within each pooling region in a set of switch variables. In the deconvnet, the unpooling operation uses these switches to place the reconstructions from the layer above into appropriate locations, preserving the structure of the stimulus.

For more technical input and context have a look at this really good, demonstrative and in depth explanation: http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html

And have a look at https://www.quora.com/What-is-the-difference-between-Deconvolution-Upsampling-Unpooling-and-Convolutional-Sparse-Coding

F. Chollet (creator of Keras) would argue that this is a [self-supervised technique](https://blog.keras.io/building-autoencoders-in-keras.html). — hans, Jan 11 '19 at 19:52

score 6 · Answer 2 · answered May 18 '17 at 19:40

It may depend on the package you are using.

In keras they are different. Upsampling is defined here https://github.com/fchollet/keras/blob/master/keras/layers/convolutional.py Provided you use tensorflow backend, what actually happens is keras calls tensorflow resize_images function, which essentially is an interpolation and not trainable.

Transposed convolution is more involved. It's defined in the same python script listed above. It calls tensorflow conv2d_transpose function and it has the kernel and is trainable.

Hope this helps.

score 4 · Answer 3 · answered Apr 12 '19 at 22:54

here is a pretty good illustration on the difference between 1) transpose convolution and 2) upsampling + convolution. https://distill.pub/2016/deconv-checkerboard/

While the transpose convolution is more efficient, the article advocates for upsampling + convolution since it does not suffer from the checkerboard artifact.

Franck Dernoncourt · Answer 4 · 2016-12-22T01:22:54.093

2

Deconvolution in the context of convolutional neural networks is synonymous to transpose convolution. Deconvolution may have another meanings in other fields.

Transpose convolution is one strategy amongst others to perform upsampling.

edited Dec 22 '16 at 01:22

answered Dec 22 '16 at 00:45

Franck Dernoncourt

42,093
30
155
271

yes, I agree. but seem like the way the references explain them is different. Take a look at the video in No.2 and then look at the references in No.1 ( Personally, I go for No.1 explanation) – RockTheStar Dec 22 '16 at 00:47
@RockTheStar Which concept is explained differently? Transpose convolution or upsampling? – Franck Dernoncourt Dec 22 '16 at 00:49
1

the upsampling/deconvolution concept explained in the video in No.2. It is about few minute. – RockTheStar Dec 22 '16 at 00:50

In CNN, are upsampling and transpose convolution the same?

4 Answers4