When to "add" layers and when to "concatenate" in neural networks?

Question

I am using "add" and "concatenate" as it is defined in keras. Basically, from my understanding, add will sum the inputs (which are the layers, in essence tensors). So if the first layer had a particular weight as 0.4 and another layer with the same exact shape had the corresponding weight being 0.5, then after the add the new weight becomes 0.9.

However, with concatenate, let's say the first layer has dimensions 64x128x128 and the second layer had dimensions 32x128x128, then after concatenate, the new dimensions are 96x128128 (assuming you pass in the second layer as the first input into concatenate).

Assuming my above intuition is true, when would I use one over the other? Conceptually, add seems a sharing of information that potentially results in information distortion while concatenate is a sharing of information in the literal sense.

score 23 · Accepted Answer · answered Aug 07 '18 at 00:42

Adding is nice if you want to interpret one of the inputs as a residual "correction" or "delta" to the other input. For example, the residual connections in ResNet are often interpreted as successively refining the feature maps. Concatenating may be more natural if the two inputs aren't very closely related. However, the difference is smaller than you may think.

Note that $W[x,y] = W_1x + W_2y$ where $[\ ]$ denotes concat and $W$ is split horizontally into $W_1$ and $W_2$. Compare this to $W(x+y) = Wx + Wy$. So you can interpret adding as a form of concatenation where the two halves of the weight matrix are constrained to $W_1 = W_2$.

Thank you very much, but what is the purpose of having 2 instead of 1 if the difference is very little please? — Avv, Dec 31 '21 at 07:11

momo668 · Answer 2 · 2020-01-24T10:01:45.653

I am not an expert, but based on my light reading, 'addition' is used for 'identity links' in constructs such as Residue Blocks to preserve information prior to convolution, which as the pros said is useful as the network goes deeper.

Concatenation is quite confusing when it comes to "how does it help?". As you said, it is adding information in a literal sense, which seems to focus on taking a wider shot by just stacking filters arrived from different operations (after splitting the feature maps) together into a block. It seem to be used widely for 'pre-stemming'.

The two sound similar at first, but functionally shouldn't seem to be compared together.

When to "add" layers and when to "concatenate" in neural networks?

2 Answers2