Is this Tensorflow bias vector shaped correctly?

Question

In the text I read the following:

I’m confused on the dimensions of the bias vector. How can we add a(m,1) vector to a(1, p) vector? Is w0 shaped correctly? Or should w1 be shaped (n, P) to account for P classes, and the we broadcast w0?

Note: I assume w1 should be (n, P) so that our matrix multiplication yields a row of unnormalized logits for each class prediction for each observation. Then does it make sense to add a per-class bias and broadcast that to the number of samples in our data?

I feel foolish for even asking but walking through the example I couldn’t reconcile...

score 4 · Accepted Answer · answered Feb 08 '19 at 05:05

Let's walk through the matrix arithmetic. The matrix-vector product $Xw_1$ has shape $(m,n)(n,1)$, so the result has shape $(m,1)$. We can think of this as each row of X being mapped to a scalar. This doesn't match with the text's description: the intent appears to be to have an output for each class, i.e. multinomial logistic regression. For this to be multinomial logistic regression, we would need to have w_1 with shape $(n,P)$.

I don't recall TF's broadcasting rules but if this code executes without raising shape-related exceptions, it's probably because of behind-the-scenes broadcasting. This should be easy enough to validate: supply some data and check if there is a constant difference between each value of Y_hat.

Mistakes like these are why I advocate for writing unit tests for neural network models, as I do here: What should I do when my neural network doesn't learn?

Is this Tensorflow bias vector shaped correctly?

1 Answers1