0

Wikipedia [1] has the following statement without citation or additional details: "If a multilayer perceptron has a linear activation function in all neurons, that is, a linear function that maps the weighted inputs to the output of each neuron, then linear algebra shows that any number of layers can be reduced to a two-layer input-output model."

How does the linear algebra for this work? When it says "two-layer input-output model" for the reduced form, does it mean two hidden layers? Are there any other assumptions or details the Wikipedia statement leaves out (e.g. layers must have the same number of neurons in each of those layers, each hidden layer's weight matrix must be full rank)?

[1] https://en.wikipedia.org/wiki/Multilayer_perceptron#Activation_function

0 Answers0