Gaussian processes: posterior vs. predictive distributions

Question

I'm a little bit confused about the posterior distribution for Gaussian processes. If we consider the case with noise-free data, we have a prior $f|X \sim N(0, K)$. If we are given $\{(x_i, f_i): i=1.,\ldots, n\}.$ The joint distribution of training outputs $f$ and test outputs $f_*$ according to prior is $$\begin{bmatrix} f \\ f_* \end{bmatrix} \sim N\left(0, \begin{bmatrix} K(X,X) & K(X,X_*) \\ K(X_*,X) & K(X_*,X_*) \end{bmatrix}\right).$$

I think the predictive distribution is: $$f_*|X_*,X,f \sim N\left( K(X_*, X)K(X,X)^{-1}f, K(X_*,X_*)-K(X_*,X)K(X,X)^{-1}K(X,X_*) \right).$$

If this is correct, then what is the posterior distribution? I may be getting confused by the fact that the GP is "parameter-free"... I'm used to the prior being a distribution over the parameters, but in this case there are none... Also, since the prior is over functions $f$, I guess the posterior should be as well?

Where did you read that GP is parameter-free? I suspect they mean the parameters of the linear model which are the coefficients that linearly combine the basis functions. — Seeda, Jul 05 '17 at 10:45
Some discussion here: https://stats.stackexchange.com/questions/46588/why-are-gaussian-process-models-called-non-parametric — theQman, Jul 05 '17 at 13:42
It is called "nonparametric" not "parameter-free". Here I explained what it means: https://stats.stackexchange.com/a/254960/66491 — Seeda, Jul 06 '17 at 11:12

Gaussian processes: posterior vs. predictive distributions

0 Answers0