How can a deep neural network with linear activations be reduced to a two-layer model?

Asked May 18 '21 at 02:25

Active May 18 '21 at 02:25

Viewed 17 times

Wikipedia [1] has the following statement without citation or additional details: "If a multilayer perceptron has a linear activation function in all neurons, that is, a linear function that maps the weighted inputs to the output of each neuron, then linear algebra shows that any number of layers can be reduced to a two-layer input-output model."

How does the linear algebra for this work? When it says "two-layer input-output model" for the reduced form, does it mean two hidden layers? Are there any other assumptions or details the Wikipedia statement leaves out (e.g. layers must have the same number of neurons in each of those layers, each hidden layer's weight matrix must be full rank)?

[1] https://en.wikipedia.org/wiki/Multilayer_perceptron#Activation_function

asked May 18 '21 at 02:25

kendalpuckettsy3

How can a deep neural network with linear activations be reduced to a two-layer model?

0 Answers0