Relation between Gaussian Processes and Gaussian Markov Random Fields

Question

As a non expert in the field, I am relating Gaussian Processes (GP) and Gaussian Markov Random Fields (GMRF).

I might just be confused by the fact that different resources use different formalism. Here I try to report the main definitions and my thoughts.

A Gaussian Process is a supervised learning algorithm for modelling a latent function $y=f(x)+\epsilon$, given a labeled training set. It is specified by a mean function $m(x) = \mathbb{E}[f(x)]$ and the covariance function $k(x,y)=\mathbb{E}[f(x)-m(x))(f(y)-m(y))]$. Predicting a GP means computing, for a new point $x_*$:

$$ \overline{f}_* = \textbf{k}^T(K+\sigma_n^2I)^{-1}\mathbf{y}$$

$$\mathbb{V}[f_*] = k(x_*, x_*) - \mathbf{k}_*^T(K + \sigma_n^2I)^{-1}\mathbf{k_*}$$

where $\mathbf{k}$ is the vector with the covariance between the test point $x_*$ and the points in the dataset $x_i$, i.e. $\mathbf{k}_i=k(x_*, x_i)$, and $\mathbb{y}$ is the vector of the labels of the elements in the training set, and $K$ is the $n \times n$ covariance matrix between all the inputs of the training set, and $\mathbf{y}$ is the vector of $f(x)$ of the training set. In order to evaluate a model, one has to compute the LML (Logarithm of Marginal Likelihood):

$$LML = \log[p(\mathbf{y}|K + \sigma_n^2I] = - \frac{1}{2}\log\det[K+\sigma_n^2I] - \frac{1}{2}\mathbf{y}^T(K + \sigma_n^2I)^{-1}\mathbf{y} - \frac{n}{2}\log 2\pi $$

To my understanding (please correct me if I am wrong), the computation of the LML is important while training, because the training consist in the optimization of the hyper-parameters of the kernel function.

A Gaussian random field (GRF) is a random field involving Gaussian probability density functions of the variables. Specifically, a random field is defined as $X(s, \omega)$, where $s \in D$ is a set of locations (usually $D= \mathbb{R}^d$), and $\omega$ is some element in a sample space (which usually is $R$ and is removed from the notation). In the Gaussian case, each $X(s)$ is defined by a mean function $\mu(s)=\mathbb{E}[X(s)]$ and the covariance function $C(s,t)=Cov(X(s), X(t))$. For each finite collection of points: $$ x \equiv (X(s_1), \cdots, X(s_p))^T \approx N(\mu, \Sigma)$$ where $\Sigma_{ij} = C(s_i, s_j)$.

A Markov random field (MRF) is a synonym for undirected graphical models. Is a Random Field with the markov property. Interpreting the random variable $X(s)$ as a node in a undirected graph $G=(V,E)$, we say that it satisfy the markov property if:

pairwise markov property (any two non-adjacent variables are conditionally independent given all other variable)
local markov property (a variable is conditionally independent of all other variables given its neighbors)
global markov property (any two subsets of variables are conditionally independent given a separating subset)

A Gaussian Markov Random Field (GMRF) is a GRF with the Markov property, (or alternative a Markov Random Field with Gaussian random variables). By Definition 2.1 of the monogrpahy book of the presentation I linked, a GMRF is defined as such: A random vector $\mathbf{x}=(x_1, \cdots, x_n)^T \in \mathbb{R}^n$ is a GMRF w.r.t a labelled graph $G=(V,E)$ with mean $\mu$ and precision matrix $Q > 0$, iff its density has the form: $$ \pi(x) = (2\pi)^{-n/2}|Q|^{1/2}exp(-1/2(x-\mu)^TQ(x-\mu))$$ and $Q_{ij}\neq 0 \Leftrightarrow \{i,j\} \in E$ for all $i \neq j$.

The algorithm for using a GP consist roughly in computing the Cholesky decomposition of $(K + \sigma_n^2I)=LL^T$. With that, is simple to compute $\alpha = (K+\sigma_n^2I)^{-1}\mathbf{y}$. Finally, we can compute:

$$\overline{f}_* = \mathbf{k}^T\alpha$$ $$\mathbb{V}[\overline{f}_*]= k(x_*, x_*) - (L\setminus \mathbf{k}_*)^T(L\setminus\mathbf{k}_*).$$

For the fitting, we basically use gradient-based algorithm to find the best hyperparameterization of the kernel, (i.e. the covariance matrix). For this, we need to repeatedly evaluate the LML untill we converge to the best value of the hyperparameters.

In general, I think GMRF can be seen as a special case of GP. In GP, we usualy specify the covariance matrix $K$ (perhaps after applying a kernel function $\phi$ to the data, such that $K=\phi(X)^T\phi(X)$, while

The questions are the following:

If I have an algorithm to fit a GP (imagine a black box that computer a LML, the $\overline{f_*}$ and the variance), can I use it to fit a GMRF?
What other algorithms are relevant for GMRF or GP? I think (but I am not sure) these other problems are usually faced by practitioners in GP/GMRF, that are not solved by the algorithm previously mentioned:
- Sampling from the Gibbs distribution of the process
- Reconstruct the precision matrix $Q = K^{-1}$: which encodes the structure of the underlying graph of the bayesian network, given that I have access to the whole covariance matrix of the data $K$.
- Algorithm for doing classification with GP.

Searching over the internet, I have found these resources:

This presentation titled "Gaussian Processes and Gaussian Markov Random Fields", where unfortunately I should have the audio of the presentation to understand properly the link between the two topics.
This paper where in section two they discuss "From Gaussian Process to GMRF".
In the Bible for Gaussian Processes, there is a reference to GMRF in the last equation of the book, in the appendix.
In the Bible for GMRF, there are a few references to Gaussian Processes.
On wikipedia is possible to read that: A one-dimensional GRF is also called a Gaussian process.
This presentation, where it shows how you can interpret a GMRF as a Markov Network (i.e. an undirected graphical model), where each of the components of a $d$ dimensional Gaussian becomes nodes in the graph.

Relation between Gaussian Processes and Gaussian Markov Random Fields

0 Answers0