3

Consider a data set in which each target $t_n$ is associated with a weighting factor $r_n > 0$, so that the sum-of-squares error funtion becomes

$$SE(w)= \frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n)− t_n\right)^2.$$

Find an expression for the solution $w^*$ that minimizes this error function.

What I have understood so far:

  • Maximizing the likelihood function is equivalent to minimizing the error function, this is why we do this.
  • The variable $t_n$ would be my $y$ if I looked at linear regression with two variables.
  • $t_n$ is my target variable, i.e. the dependent variable from my real observations.
  • $\phi(x_n)$ is just a function (i.e. a linear or non-linear often called basis function)
  • $\mathbf{w}^T$ is a vector of independent variables (is that correct?). In Bishops book all vectors are assumed to be column vectors so $\mathbf{w}^T$ is a row vector)
  • So if $\mathbf{w}^T$ is a row vector then my target variable $t_n$ must be a row vector of same dimension (is that correct?)

So far so good. I now want to take the derivate of $SE$. However, I don't know, what the derivate of $\mathbf{w}^T$ is.

This is my approach:

$$\nabla_w \text{SE}(\mathbf{w}) = \sum_{n=1}^N r_n (\mathbf{w}^T \phi(x_n) -t_n) \nabla \mathbf{w}^T \phi(x_n) $$ $$\nabla_w \text{SE}(\mathbf{w}) = \sum r_n \mathbf{w} \nabla \mathbf{w}^T \phi(x_n)^2 - r_n t_n \nabla \mathbf{w} \phi(x_n)$$

Setting this to zero and solving for $\mathbf{w}$ yields

$$\frac{\sum_{n=1}^N r_n t_n \nabla \mathbf{w}^T \phi(x_n)}{\sum_{n=1}^N r_n t_n \nabla \mathbf{w}^T \phi(x_n)^2}.$$

Please help me on this. I don't want the solution straight away. This is an assignment for my machine learning class, I just want to understand it.

/edit:

Looking at my approach above I wasn't too far off. Made a mistake when multiplying out, but besides not knowing how to get the derivate it mostly stays the same.

$$\begin{align} SE(w)=& \frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n) - t_n\right)^2. \\ \frac{\partial SE(w)}{\partial w}=& \;\frac{1}{2} \sum_{n=1}^N r_n \left(\mathbf{w}^T \phi(x_n) - t_n\right)^2 \\ =& \sum_{n=1}^N r_n \left(w^T \phi(x_n) - t_n\right) \cdot \phi(x_n) \\ =& \sum_{n=1}^N r_n \phi(x_n)^2 w^T - r_n t_n \phi(x_n) \\ =& \sum_{n=1}^N r_n \phi(x_n)^2 w^T - r_n t_n \phi(x_n) \\ &\text{Find minimum.} \\ 0 =& \frac{\partial SE(w)}{\partial w} \\ 0 =& \sum_{n=1}^N r_n \phi(x_n)^2 w^T - r_n t_n \phi(x_n) \\ w^T =& \frac{\sum_{n=1}^N r_n t_n \phi(x_n)}{\sum_{n=1}^N r_n \phi(x_n)^2} \\ \end{align}$$

Marcel Braasch
  • 215
  • 1
  • 9

1 Answers1

1

$\mathbf{w}$ is a vector of parameters to be estimated, not independent variables. $\mathbf{w}^T$ is a row vector; so $\phi(x_n)$ must a column vector of the same dimension, such that the product $\mathbf{w}^T\phi(x_n)$ is a scalar, which means $t_n$ is a scalar, not vector. Otherwise, the squaring operation doesn't make sense.

You can use the following for the derivation: $\partial(\mathbf{w}^Ty)/\partial\mathbf{w}=y$. Check here for simple matrix calculus.

gunes
  • 49,700
  • 3
  • 39
  • 75
  • Thank you! That cheat sheet is awesome. I wasn't aware of something like this and the Matrix Cookbook. I'm editing my answer in a second, could you check if 1) what I did is correct 2) it is possible to simplify the term I'm getting. – Marcel Braasch Nov 17 '19 at 19:24
  • Uploaded it. And I just saw, I can cross one $\phi(x_n)$, right? – Marcel Braasch Nov 17 '19 at 19:45