Consider a data set in which each target $t_n$ is associated with a weighting factor $r_n > 0$, so that the sum-of-squares error funtion becomes
$$SE(w)= \frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n)− t_n\right)^2.$$
Find an expression for the solution $w^*$ that minimizes this error function.
What I have understood so far:
- Maximizing the likelihood function is equivalent to minimizing the error function, this is why we do this.
- The variable $t_n$ would be my $y$ if I looked at linear regression with two variables.
- $t_n$ is my target variable, i.e. the dependent variable from my real observations.
- $\phi(x_n)$ is just a function (i.e. a linear or non-linear often called basis function)
- $\mathbf{w}^T$ is a vector of independent variables (is that correct?). In Bishops book all vectors are assumed to be column vectors so $\mathbf{w}^T$ is a row vector)
- So if $\mathbf{w}^T$ is a row vector then my target variable $t_n$ must be a row vector of same dimension (is that correct?)
So far so good. I now want to take the derivate of $SE$. However, I don't know, what the derivate of $\mathbf{w}^T$ is.
This is my approach:
$$\nabla_w \text{SE}(\mathbf{w}) = \sum_{n=1}^N r_n (\mathbf{w}^T \phi(x_n) -t_n) \nabla \mathbf{w}^T \phi(x_n) $$ $$\nabla_w \text{SE}(\mathbf{w}) = \sum r_n \mathbf{w} \nabla \mathbf{w}^T \phi(x_n)^2 - r_n t_n \nabla \mathbf{w} \phi(x_n)$$
Setting this to zero and solving for $\mathbf{w}$ yields
$$\frac{\sum_{n=1}^N r_n t_n \nabla \mathbf{w}^T \phi(x_n)}{\sum_{n=1}^N r_n t_n \nabla \mathbf{w}^T \phi(x_n)^2}.$$
Please help me on this. I don't want the solution straight away. This is an assignment for my machine learning class, I just want to understand it.
/edit:
Looking at my approach above I wasn't too far off. Made a mistake when multiplying out, but besides not knowing how to get the derivate it mostly stays the same.
$$\begin{align} SE(w)=& \frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n) - t_n\right)^2. \\ \frac{\partial SE(w)}{\partial w}=& \;\frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n) - t_n\right)^2 \\ =& \sum_{n=1}^N r_n \left(w^T \phi(x_n) - t_n\right) \cdot \phi(x_n) \\ =& \sum_{n=1}^N r_n \phi(x_n)^2 w^T - r_n t_n \phi(x_n) \\ =& \sum_{n=1}^N r_n \phi(x_n)^2 w^T - r_n t_n \phi(x_n) \\ &\text{Find minimum.} \\ 0 =& \frac{\partial SE(w)}{\partial w} \\ 0 =& \sum_{n=1}^N r_n \phi(x_n)^2 w^T - r_n t_n \phi(x_n) \\ w^T =& \frac{\sum_{n=1}^N r_n t_n \phi(x_n)}{\sum_{n=1}^N r_n \phi(x_n)^2} \\ \end{align}$$