Do deep neural networks learn slower with the addition of more hidden layers?

Question

The number of hidden layers increases the number of weights, also increases the terms in the back-propagation algorithm, i.e. more derivatives, hence more computation. Can we say that neural networks learns slower with the addition of more hidden layers?

score 2 · Answer 1 · answered Sep 07 '21 at 05:51

It'll have more weights and in each new added layer you're getting more complex functions that's in NN sense your function will be a more complex composition of another functions and your output will be hardier to optimise so its convergence will be slower. I'd say that convergence getting slower is mainly due to complexity composition, the number of weights will not change its speed very much.

score 0 · Answer 2 · answered Sep 12 '21 at 07:05

Let’s say the set of functions that a neural network with k layers can learn is F (this is called your parameterized function space, since it’s all the functions with the parameters of the weights of your network). With k+1 layers, the set of functions you can learn, let’s call it F*, is all the functions in F, plus any new functions from the new layer (just have the new layer be the identity to reach all the functions in F). So F* is strict larger than F.

It might be that you can more quickly converge in the larger function landscape by chance- but in expectation (on average) since the landscape is larger, it should take longer to converge to a local minima in loss. This the larger network will learn slower.

Do deep neural networks learn slower with the addition of more hidden layers?

2 Answers2