Minimize SSE function

Question

Consider a data set in which each target $t_n$ is associated with a weighting factor $r_n > 0$, so that the sum-of-squares error funtion becomes

$$SE(w)= \frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n)− t_n\right)^2.$$

Find an expression for the solution $w^*$ that minimizes this error function.

What I have understood so far:

Maximizing the likelihood function is equivalent to minimizing the error function, this is why we do this.
The variable $t_n$ would be my $y$ if I looked at linear regression with two variables.
$t_n$ is my target variable, i.e. the dependent variable from my real observations.
$\phi(x_n)$ is just a function (i.e. a linear or non-linear often called basis function)
$\mathbf{w}^T$ is a vector of independent variables (is that correct?). In Bishops book all vectors are assumed to be column vectors so $\mathbf{w}^T$ is a row vector)
So if $\mathbf{w}^T$ is a row vector then my target variable $t_n$ must be a row vector of same dimension (is that correct?)

So far so good. I now want to take the derivate of $SE$. However, I don't know, what the derivate of $\mathbf{w}^T$ is.

This is my approach:

$$\nabla_w \text{SE}(\mathbf{w}) = \sum_{n=1}^N r_n (\mathbf{w}^T \phi(x_n) -t_n) \nabla \mathbf{w}^T \phi(x_n) $$ $$\nabla_w \text{SE}(\mathbf{w}) = \sum r_n \mathbf{w} \nabla \mathbf{w}^T \phi(x_n)^2 - r_n t_n \nabla \mathbf{w} \phi(x_n)$$

Setting this to zero and solving for $\mathbf{w}$ yields

$$\frac{\sum_{n=1}^N r_n t_n \nabla \mathbf{w}^T \phi(x_n)}{\sum_{n=1}^N r_n t_n \nabla \mathbf{w}^T \phi(x_n)^2}.$$

Please help me on this. I don't want the solution straight away. This is an assignment for my machine learning class, I just want to understand it.

/edit:

Looking at my approach above I wasn't too far off. Made a mistake when multiplying out, but besides not knowing how to get the derivate it mostly stays the same.

$$\begin{align} SE(w)=& \frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n) - t_n\right)^2. \\ \frac{\partial SE(w)}{\partial w}=& \;\frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n) - t_n\right)^2 \\ =& \sum_{n=1}^N r_n \left(w^T \phi(x_n) - t_n\right) \cdot \phi(x_n) \\ =& \sum_{n=1}^N r_n \phi(x_n)^2 w^T - r_n t_n \phi(x_n) \\ =& \sum_{n=1}^N r_n \phi(x_n)^2 w^T - r_n t_n \phi(x_n) \\ &\text{Find minimum.} \\ 0 =& \frac{\partial SE(w)}{\partial w} \\ 0 =& \sum_{n=1}^N r_n \phi(x_n)^2 w^T - r_n t_n \phi(x_n) \\ w^T =& \frac{\sum_{n=1}^N r_n t_n \phi(x_n)}{\sum_{n=1}^N r_n \phi(x_n)^2} \\ \end{align}$$

Is my solution fine besides that? Doesn't change much, does it? — Marcel Braasch, Nov 17 '19 at 20:03
Your way is ok. I think you'll be more errorless if you solve for each $w_i$ and then vectorize the output. — gunes, Nov 17 '19 at 20:04

score 1 · Accepted Answer · answered Nov 17 '19 at 16:34

1

$\mathbf{w}$ is a vector of parameters to be estimated, not independent variables. $\mathbf{w}^T$ is a row vector; so $\phi(x_n)$ must a column vector of the same dimension, such that the product $\mathbf{w}^T\phi(x_n)$ is a scalar, which means $t_n$ is a scalar, not vector. Otherwise, the squaring operation doesn't make sense.

You can use the following for the derivation: $\partial(\mathbf{w}^Ty)/\partial\mathbf{w}=y$. Check here for simple matrix calculus.

answered Nov 17 '19 at 16:34

gunes

49,700
3
39
75

Thank you! That cheat sheet is awesome. I wasn't aware of something like this and the Matrix Cookbook. I'm editing my answer in a second, could you check if 1) what I did is correct 2) it is possible to simplify the term I'm getting. – Marcel Braasch Nov 17 '19 at 19:24
Uploaded it. And I just saw, I can cross one $\phi(x_n)$, right? – Marcel Braasch Nov 17 '19 at 19:45

Minimize SSE function

1 Answers1