4

What is the precise definition of layer in neural network? Are things like concatenate functions, activations, batch normalizations, skip connections considered as layers?

Tim
  • 108,699
  • 20
  • 212
  • 390
  • I don't think many people have ever seriously tried to formalize the definition of a layer, since there doesn't seem to be any conceptual clarity which would be gained by having such a definition. – shimao Nov 19 '18 at 06:21
  • @shimao I also don't recall seeing any precise definition, that's exactly the rationale behind the question. – Tim Nov 19 '18 at 14:26
  • I found some people use the word layer to describe a set of operations. But at the same time, there is what is called a hidden layer. Here the hidden layer is the result of some operation, not the the operator. So I think the term layer is being seriously abused. I hope somebody comes put here and offers general diagnosis for the current state of the literature... – bombs Apr 30 '20 at 06:47
  • Contrast this question with https://stats.stackexchange.com/questions/362425/what-is-an-artificial-neural-network – Firebug Oct 08 '20 at 19:12

1 Answers1

2

Here is my attempt at a definition.

I think the common usage of "layer" refers to a parameterized linear transform optionally followed by some parameterless nonlinear function.

For example, $\sigma(Wx+b)$ contains a linear transform parameterized by $W$ and $b$ followed by the nonlinear sigmoid activation.

I don't think most people would consider an activation function by itself to be a layer, so that's one point for this definition.

The term "softmax layer" usually means $\text{softmax}(Wx+b)$ where $W \in \mathbb{R}^{n \times m}$ and $b \in \mathbb{R}^n$ where $n$ is the desired number of categories. So it satisfies the definition.

I don't think many consider batch norm by itself to be a layer, although it may be referred to as a layer by certain programming frameworks. According to this definition, a convolution followed by batch norm followed by relu is a single layer. This agrees with how layers are counted in the ResNet-152 architecture.

Skip-connections and concatenations aren't counted as layers, and this definition accounts for that by considering them as part of the linear transform of the next layer they feed into.

This definition correctly rules out ResNet "residual-blocks" as single layers, since they have two separate parametric linear transforms separated by the nonlinear relu in between.

This definition also rules out a single-layer LSTM as a layer, which I think is fair, given its complexity. However many people probably do think of an LSTM as a single layer.

This definition also unfairly rules out quadratic neural networks which often compute a layer like $\vec y = \{x^T W_ix\}$.

shimao
  • 22,706
  • 2
  • 42
  • 81