1

We are observing a linear model which is supposed to predict a target $t$ of some new value $x$ given some data $D=\{(x_1,t_1),\ldots,(x_N,t_N)\}$. This model is defined by $$y(\mathbf{x}, \mathbf{w}) = \sum_{j=1}^{M-1} w_j\phi_j(\mathbf{x}) = \mathbf{w}^T\phi(\mathbf{x})$$ where $\mathbf{w}$ denotes weighting vector to be determined and $\phi(\mathbf{x})$ denotes the design matrix.

Question 1:

What is it the basis functions in $\phi_j$ really do?

If I understand correctly, really what it is does is transform the data into a new form where it can be approximated by a linear function. I.e. we linearize the data? Is this why, even though the underlying data may not be close to being linear, we call the model a linear model?

Question 2:

How can I picture the design matrix?

As far as I understand, its just a vertical stack of our (vector of) basis functions. The component wise operation makes intuitive sense, but it's hard for me to grasp the matrix notation.

I'm mostly learning with bishops PRML book, so I used his notation for it.

Marcel Braasch
  • 215
  • 1
  • 9
  • 2
    Your first question is answered in several threads, including https://stats.stackexchange.com/questions/61747/ and https://stats.stackexchange.com/questions/148638 (both found with [this search](https://stats.stackexchange.com/search?q=linear+nonlinear+regression+score%3A5)). A "matrix" is a rectangular array of numbers, like a range on a spreadsheet. Could you state more precisely, then, what you mean by "picture" a matrix? – whuber Mar 12 '20 at 20:17
  • I like this video: https://www.youtube.com/watch?v=rVviNyIR-fI&list=PLD0F06AA0D2E8FFBA&index=53&t=0s. – Dave Mar 12 '20 at 20:43

1 Answers1

1
  1. When the basis functions are pre-computed, you're only estimating $w$, which is a linear problem. In other words, given a fixed choice of $\phi$ and fixed data $x$, you're minimizing some loss $\min_w \mathcal{L}(t, w^\top \phi(x) )$. If you're finding it confusing to think about $\phi(x)$, we can just denote our design matrix as $\phi(x)=Z$. In the case of a least-squares loss function, we have the usual linear model $\min_w \| t-w^\top Z \|_2^2$.

  2. A design matrix is a matrix just like any other matrix: a rectangular array indexed by rows and columns.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • Thank you! And is it true to say that we try to transform the data into a somewhat linear form to approximate it? If not, how do I build a bridge between some underlying data und choosing good basis functions? – Marcel Braasch Mar 12 '20 at 20:26