Is there anything wrong with treating neural network bias as a node?

Question

I've seen a lot of examples of neural networks where when they introduce the bias, they treat it as a node that always outputs 1, and then the nodes each have an individual weight for it, instead of each having a number that they add with the rest of their inputs before being activated. Is there anything wrong with this form, like maybe how it loses the point of being modeled after cells in the brain?

The analogy between artificial neural nets and the brain is extremely loose. Neural nets designed for machine learning don't have biological plausibility as a goal in any case. Although some features may be loosely biologically inspired, performance is typically the goal. — user20160, Jul 30 '16 at 15:55

score 3 · Accepted Answer · answered Jul 27 '16 at 16:25

That seems totally reasonable!

If you do the math, you'll find that the underlying operations are exactly the same in both cases. In one case, you multiply each input ($x_i$) by the corresponding input ($w_i$), then add a separate bias term ($b$), so the input to your nonlinearity is: $$ b + \sum_{i=1}^{i=n} w_ix_i$$

Now, suppose we append a one to $\bf{x}$ to create $\mathbf{x}_{\textrm{new}}=\big[x_1, x_2, \ldots, x_n, 1\big]$ and extend the weight vector to match. The nonlinearity's input becomes: $$\sum_{i=1}^{i=n+1} w_ix_i$$ The first $n$ elements of $w_i$ will be the same, and the last element will become $b$. Anything times one is itself, so...the final answer is the same.

Furthermore, this lets you do every stage as a single matrix multiplication ($\bf{W} ^{\intercal} \bf{x}$). These can be performed vastly more efficiently than a naive for-loop like:

output = bias
for i=1 to i=N
    output = output + (weight[i] * input[i])
end

If you're interested in this, you may want to look into linear algebra packages like BLAS (or ATLAS, MKL, etc), or some of the GPGPU work like CUDA and OpenCL.

Is there anything wrong with treating neural network bias as a node?

1 Answers1