An example for Gaussian Process: Singular covariance matrix?

Question

I follow Christopher Bishop's book "Pattern Recognition and Machine Learning" and I am studying the section on Gaussian Processes. As an introduction, a simple model is given with the following properties: The model is defined as the linear combination of $M$ fixed basis functions as: $$y(x) = w^T\phi(x)$$ where the prior $p(w)=N(w|0,\alpha^{-1}I)$ is set on the coefficients $w$. Since the Gaussian Process defines all evaluations of $y(x)$ on a set of points $x_1,\dots, x_N$ as a joint Gaussian distribution, we can give the equation $p(y(x_1),\dots,y(x_N)|x_1,\dots,x_N)=\int p(y(x_1),\dots,y(x_N)|w,x_1,\dots,x_N)p(w)dw$ which is equal to the distribution of the vector $Y=\Phi w$, $Y=(y(x_1),\dots,y(x_N))^T$ and $\Phi$ is a $N \times M$ matrix which has $\phi(x_n)$ in its $n$ th row.

The distribution of $Y$ is said to be Gaussian, since $w$ is Gaussian. It has $0$ as mean and $\dfrac{1}{\alpha} \Phi \Phi^T = K$ as the covariance matrix.

In this example given by the author, I think there is a problem. When the number of data points, $N$, exceeds the number of basis functions, $M$, then the covariance matrix becomes singular and it could not be inverted, hence can't be used in a Gaussian pdf. Indeed, we can see this problem from $Y=\Phi w$; if $N > M$ then the columns of $\Phi$ only spans a $M$ dimensional subspace, some $Y$ values cannot be expressed.

Is this an error in the example or maybe this is simply ignored since this is an introduction to the concept? Or am I mistaken somewhere?

Please see http://stats.stackexchange.com/questions/60622/why-is-a-sample-covariance-matrix-singular-when-sample-size-is-less-than-number — Sid, May 14 '15 at 00:08
Jointly Gaussian random variables are not _required_ to have a nonsingular covariance matrix, Of course, when the covariance matrix is singular, one cannot write down a $n$-variate **density** function but this lack of a density function does not affect the jointly Gaussian nature of the random variables. A standard definition of joint Gaussianity is that $\mathbf X$ is a vector of jointly Gaussian random variables if $\mathbf a^T\mathbf X$ is a Gaussian random variable for _all_ choices of real vector $\mathbf a$. continued.... — Dilip Sarwate, May 14 '15 at 02:25
.... continued... Note that $\mathbf a=\mathbf 0$ is permitted and gives a _degenerate_ Gaussian random variable. There is an _extensive discussion_ of these points in the comments following [this answer](http://stats.stackexchange.com/a/30160/6633) of Macro's. If that discussion is too overwhelming, I strongly recommend reading [this answer](http://stats.stackexchange.com/a/30205/6633) to the same question by Moderator cardinal. — Dilip Sarwate, May 14 '15 at 02:29

An example for Gaussian Process: Singular covariance matrix?

0 Answers0