1

I am slightly confused by counting the number of NN parameters. Let's assume there is a NN with 4-dim vector as an input, then comes 5-dim hidden layers, and another one 6-dim hidden layer. There is a single neuron output with sigmoid. Negative log likelihood as a loss function.

The question is how many parameters should be updated on the first step of the gradient descent? (including biases)

My calculation is 4*5+5 then 5*6+6 and 6. But I am not sure about the last 6. I would appreciate any help.

gar
  • 11
  • 1

1 Answers1

1

By X-dim hidden layer, I presume you mean X hidden layer neurons. In the first layer, each neuron takes $4+1$ inputs, in the second each neuron takes $5+1$ inputs. So, $5^2+6^2$ as you put it is correct. But, for the last layer, the preceding hidden layer neurons have $6$ outputs. These outputs and a bias term is fused into the final neuron in the output layer, which makes up $6+1=7$ parameters instead of $6$ as you said. Number of parameters is irrelevant to the loss function by the way.

gunes
  • 49,700
  • 3
  • 39
  • 75
  • thank you very much! Your answer is very helpful. I am slightly confused by the phrase *"There is a single neuron output with sigmoid"*, it seems like I need one more bias term. Right? I am not sure if it's typical to run sigmoid on an output. – gar Feb 13 '19 at 08:38
  • unless stated otherwise, you should account for biases in neurons. sigmoid is just an activation function. Bias is present in general, i.e. $f(w^tx+b)$, $f$ can be sigmoid or another function. – gunes Feb 13 '19 at 09:53