I follow Christopher Bishop's book "Pattern Recognition and Machine Learning" and I am studying the section on Gaussian Processes. As an introduction, a simple model is given with the following properties: The model is defined as the linear combination of $M$ fixed basis functions as: $$y(x) = w^T\phi(x)$$ where the prior $p(w)=N(w|0,\alpha^{-1}I)$ is set on the coefficients $w$. Since the Gaussian Process defines all evaluations of $y(x)$ on a set of points $x_1,\dots, x_N$ as a joint Gaussian distribution, we can give the equation $p(y(x_1),\dots,y(x_N)|x_1,\dots,x_N)=\int p(y(x_1),\dots,y(x_N)|w,x_1,\dots,x_N)p(w)dw$ which is equal to the distribution of the vector $Y=\Phi w$, $Y=(y(x_1),\dots,y(x_N))^T$ and $\Phi$ is a $N \times M$ matrix which has $\phi(x_n)$ in its $n$ th row.
The distribution of $Y$ is said to be Gaussian, since $w$ is Gaussian. It has $0$ as mean and $\dfrac{1}{\alpha} \Phi \Phi^T = K$ as the covariance matrix.
In this example given by the author, I think there is a problem. When the number of data points, $N$, exceeds the number of basis functions, $M$, then the covariance matrix becomes singular and it could not be inverted, hence can't be used in a Gaussian pdf. Indeed, we can see this problem from $Y=\Phi w$; if $N > M$ then the columns of $\Phi$ only spans a $M$ dimensional subspace, some $Y$ values cannot be expressed.
Is this an error in the example or maybe this is simply ignored since this is an introduction to the concept? Or am I mistaken somewhere?