I know that finding the weights of a neural network requires gradient descent as there is no closed form available. I know this from the books, and not knowing exactly why the derivative w.r.t. the weights is not zero-able led me to try to do it.
Let's consider the traditional sigmoid MLP, with just one layer and just one datapoint $<\mathbf{x},t>$. The gradient vector of the MSE loss function w.r.t. the weights is:
$$\frac{\partial}{\partial\mathbf{w}} \frac{1}{2}\left( t - f(\mathbf{w}\cdot\mathbf{x}) \right)^2$$
which becomes:
$$ = -(t - s(\mathbf{w}\cdot\mathbf{x}))s(\mathbf{w}\cdot\mathbf{x})(1-s(\mathbf{w}\cdot\mathbf{x}))\mathbf{x}$$
with $s(\mathbf{h})$ being defined as the sigmoid function:
$$s(\mathbf{h}) = \frac{1}{1+e^{-\mathbf{h}}}$$
Now, how to solve (finding the zero) of the gradient expression?
$$-(t - s(\mathbf{w}\cdot\mathbf{x}))s(\mathbf{w}\cdot\mathbf{x})(1-s(\mathbf{w}\cdot\mathbf{x}))\mathbf{x} = 0$$
What I could do is to analyze the various factors and see where they individually zero. The sigmoid zeroes at least one position of the $\mathbf{w}$ vector set to $-\infty$, as well as the $1-s(\mathbf{w}\cdot\mathbf{x})$ zeroes with one position of $\mathbf{w}$ set to $+\infty$. This is not useful.
A few questions:
- does the gradient expression even have a zero, even if it cannot be found in a closed-form?
- what would the canonical procedure to solve for the zeroes in this function? (I have experience for linear and polynomial cases, but I am ignorant of more complicated cases such as this).
- can it be demonstrated that there is no closed form?
- what about other non-linear activation functions? Might some lead to closed-form zeroes?
- might be the case that there is a closed form for an individual datapoint but not for an expression that consider multiple datapoints?
- does a dimensionality of $\mathbf{x}$ and $\mathbf{w}$ set to 1 (scalars) make a difference to find a closed form?
- (are my derivations correct?)
I am grateful for any eventual answers even if only tangentially related, and for any corrections to my procedure and terminology.