Understanding likelihoods for Gaussian Processes

Question

What does it mean when we talk about the "Gaussian likelihood" for a Gaussian Process? Is it true to think that the "Gaussian likelihood" only means we factor in a noise term into the covariance function? i.e.

$$cov(x,x') = K(x,x') + \sigma^2$$

In other words, we only add $\sigma^2$ to the diagonal of the covariance matrix. Is my understanding correct? Furthermore, what happens if the variance of the Gaussian likelihood is zero? Does the likelihood become non-Gaussian and we can't use the standard Gaussian Process inference any more?

bill_e · Answer 1 · 2017-08-04T07:26:11.303

0

The terminology around that is confusing. On its own, a Gaussian process is a prior distribution over a function $f(x)$, $p(f \mid x)$. The most common (by far!) case you see discussed is when that function additionally has IID Gaussian noise added to it, call it $y$. $y = f + \epsilon$, where $\epsilon \sim N(0, \sigma^2)$. Bayes theorem here is

$$ p(f, \sigma^2 \mid x, y) = \frac{p(y \mid f, x, \sigma^2) p(f \mid x) p(\sigma^2)}{p(y \mid x)} $$

The likelihood, $p(y \mid f, x, \sigma^2)$ is a product of normals, each with mean $f_i$ and variance $\sigma^2$, and $p(f \mid x)$ is a multivariate normal, the GP prior.

These distributions are conjugate, so $f$ can be integrated out analytically, producing the marginal likelihood. This marginal likelihood ends up being a multivariate normal whose covariance is, like you had, $K(x, x') + \sigma^2$.

So your understanding is partially correct. If the variance of the Gaussian likelihood goes to zero, you absolutely still use GPs. All that happens is that Bayes theorem above gets a bit simpler.

This answer may help you also.

edited Aug 04 '17 at 07:26

answered Aug 04 '17 at 07:18

bill_e

2,681
1
19
33

Thank you for the answer! How is conjugacy preserved when the white noise term is removed? Particularly, how is the likelihood $p(y|f,x,\sigma^2)$ calculated when the variance of each of the normals becomes zero? Or is it the case that we don't have to worry about the likelihood since integrating $f$ still somehow works out even with the variance being zero? – peco Aug 05 '17 at 14:16
But what does it mean to remove the white noise term? What would the GP prior be conjugate with? Try writing it down. If $p(y | f, x, \sigma^2)$ is non-normal then conjugacy cannot be used and $f$ cannot be integrated out. – bill_e Aug 06 '17 at 16:17
Ah, what I meant by "remove white noise term" is to bring the variance towards zero. I'm not sure if you're suggesting a difference between _setting the "variance" to zero_ and _letting the variance go to zero_? – peco Aug 09 '17 at 00:41

Understanding likelihoods for Gaussian Processes

1 Answers1