1

I'm learning NN and trying to understand the implication that architecture selection has in the model. Am I inferring correctly that:

  • Wider neural networks can approximate more interactions between input variables
  • Deeper neural networks can model more complex nonlinearities
nba2020
  • 313
  • 2
  • 12

1 Answers1

2

Roughly.

First, that's assuming all layers are created equal. Imagine you want to interleave two different activation functions for each layer, or add dropout - then deeper networks give you more legroom for those to work their magic.

Second, a very deep, moderately wide net may be able capture the same interactions as a shallow, absurdly wide one, but not vice versa.

jkm
  • 1,904
  • 1
  • 8
  • 12