I want to update a bias in my Neural Network using the gradient descent optimization algorithm. Unfortunately, the bias has different dimensions than the derivative of the loss function with respect to the bias. For example, the bias in the first hidden layer has dimensions 1 x hidden_size and the delta error (i.e. the derivative of the loss function with respect to the bias) has dimensions train_size x hidden_size. So I can't just subtract one from the other.
I have seen here that the author is summing the delta error over the columns, but then I don't understand why.
Could someone help me with it ?