4

Suppose I have trained a neural net to solve a classification problem with $m$ hidden layers, each having $n_i$ neurons: $$ n_1,\dots,n_m $$ and let's also assume $$ n_1>n_2>\dots>n_m , $$ and that I keep the number of connections $N_c$ constant.

Now if I have found that networks with $m=M$ generally perform worse than a network with $m=M-1$ (i.e. I made a few attempts), does it make sense to try a network with $m=M+1$ or will it generally perform even worse?

What are instead typical signs that suggest adding a hidden layer instead of adding neurons on the same layer might improve performance?

Answers to this popular question on CV:

How to choose the number of hidden layers and nodes in a feedforward neural network?

suggest I should go from 0 to 1 if linear separation does not work for my data but how do I get aware of that?

fabiob
  • 622
  • 4
  • 13

1 Answers1

1

You cannot definitively say it won't be better with more depth.

Chain of negations, I know, but that's the most accurate way of putting it.

In an ideal world, adding extra depth does not, in itself, impair an already good shallower network in any way - you could just pass the outputs of the previous layer with minimal changes.

How many hidden layers are required to see an improvement where a shallow net underperforms is anyone's guess, but in general more depth would be better - you get more abstract, more general solutions.

In practice though, optimization isn't quite so neat, and adding capacity adds risks of the process falling into various pitfalls - local minima, overfitting... And of course then there's the added computational cost.

I would highly recommend playing around with the Tensorflow playground to get a better intuition on the impact of extra depth with regards to the network behavior.

jkm
  • 1,904
  • 1
  • 8
  • 12