I am slightly confused by counting the number of NN parameters. Let's assume there is a NN with 4-dim vector as an input, then comes 5-dim hidden layers, and another one 6-dim hidden layer. There is a single neuron output with sigmoid. Negative log likelihood as a loss function.
The question is how many parameters should be updated on the first step of the gradient descent? (including biases)
My calculation is 4*5+5 then 5*6+6 and 6. But I am not sure about the last 6. I would appreciate any help.