1

In class we discussed that if the weights of an ANN (standard feed forward NN in binary classification setting [0,1]) are initialized all at zero, the ANN fails to break symmetrie and therefore, the units in each layer develop equivalently.

My professor stated at the end that the vector of predictions at the end represents something and I forgot what exactly. The resulting vector that contains one element for each observation has everywhere the same value 0.425. Is it the average conditional probability for beeing in class 1?

J3lackkyy
  • 535
  • 1
  • 9

1 Answers1

2

I can't comment on the explanation and example that your professor gave since I haven't heard it, so let me give a general comment. If you initialize all weights at zeros, then all the neurons within the layers would be equivalent, this means that they would be learning the same thing and be more prone to getting stuck in local optima. What follows, if you used such initialization, you are wasting computations, because you would get an equivalent result to having each layer consist just of a single neuron. So the output of such a model represents an output of the much simpler model if this answers your question.

Tim
  • 108,699
  • 20
  • 212
  • 390