14

Kullback-Leibler divergence is a metric to compare two probability density functions, but what metric is used to compare two GP's $X$ and $Y$?

Zen
  • 21,786
  • 3
  • 72
  • 114
pushkar
  • 169
  • 1
  • 6

2 Answers2

9

Remark that the distribution of Gaussian processes $\mathcal{X}\to\mathbb{R}$ is the extension of multivariate Gaussian for possibly infinite $\mathcal{X}$. Thus, you can use the KL divergence between the GP probability distributions by integrating over $\mathbb{R}^\mathcal{X}$ :

$$D_{KL}(P|Q)=\int_{\mathbb{R}^\mathcal{X}} \log \frac{dP}{dQ} dP\,.$$

You can use MC methods to approximate numerically this quantity over a discretized $\mathcal{X}$ by repeatedly sampling processes according to their GP distribution. I don't know if the convergence speed is sufficiently good...

Remark that if $\mathcal{X}$ is finite with $|\mathcal{X}|=n$, then you fall back to the usual KL divergence for multivariate Normal distributions: $$D_{KL}\big(\mathcal{GP}(\mu_1,K_1), \mathcal{GP}(\mu_2,K_2)\big) = \frac 1 2 \Big(tr(K_2^{-1}K_1) + (\mu_2\!-\!\mu_1)^\top K_2^{-1}(\mu_2\!-\!\mu_1)-n+\log\frac{|K_2|}{|K_1|}\Big)$$

Emile
  • 3,150
  • 2
  • 20
  • 17
  • How can I calculate two means (mu1 and mu2) you mentioned. Or I should take them equal to zero as usual for gaussian process? – Marat Zakirov Mar 06 '19 at 11:08
4

Remember that if $X:T\times \Omega\to\mathbb{R}$ is a Gaussian Process with mean function $m$ and covariance function $K$, then, for every $t_1,\dots,t_k\in T$, the random vector $(X(t_1),\dots,X(t_k))$ has a multivariate normal distribution with mean vector $(m(t_1),\dots,m(t_k))$ and covariance matrix $\Sigma=(\sigma_{ij})=(K(t_i,t_j))$, where we have used the common abbreviation $X(t)=X(t,\,\cdot\,)$.

Each realization $X(\,\cdot\,,\omega)$ is a real function whose domain is the index set $T$. Suppose that $T=[0,1]$. Given two Gaussian Processes $X$ and $Y$, one common distance between two realizations $X(\,\cdot\,,\omega)$ and $Y(\,\cdot\,,\omega)$ is $\sup_{t\in[0,1]} |X(t,\omega) - Y(t,\omega)|$. Hence, it seems natural to define the distance between the two processes $X$ and $Y$ as $$ d(X,Y) = \mathbb{E}\!\left[\sup_{t\in[0,1]} \left| X(t) - Y(t)\right|\right] \, . \qquad (*) $$ I don't know if there is an analytical expression for this distance, but I believe you can compute a Monte Carlo approximation as follows. Fix some fine grid $0\leq t_1<\dots<t_k\leq 1$, and draw samples $(x_{i1},\dots,x_{ik})$ and $(y_{i1},\dots,y_{ik})$ from the normal random vectors $(X(t_1),\dots,X(t_k))$ and $(Y(t_1),\dots,Y(t_k))$, respectively, for $i=1,\dots,N$. Approximate $d(X,Y)$ by $$ \frac{1}{N} \sum_{i=1}^N \max_{1\leq j\leq k} |x_{ij}-y_{ij}| \, . $$

Zen
  • 21,786
  • 3
  • 72
  • 114
  • How do you sample from each vector? If you only sample the means in each of the GPs you do not take into account the variances. Otherwise you will have to devise a sampling technique that is consistent. – pushkar May 29 '13 at 04:38
  • This is an excellent resource: http://www.gaussianprocess.org/gpml/chapters/ – Zen May 29 '13 at 13:37
  • You may also read all the answers to this question: http://stats.stackexchange.com/questions/30652/how-to-simulate-functional-data/30722#30722 – Zen May 29 '13 at 13:44
  • Pay attention that this is not a distance since $d(X,X) \neq 0$. As the KL compares two distributions and not two realisations, Zen's distance between two GPs should be defined as $d(G_1,G_2)=\mathbb{E}_{X\sim G_1, Y\sim G_2}[\sup_t |X(t)-Y(t)|]$, and we have that $\mathbb{E}_{X\sim G, Y\sim G} \sup_t |X(t)-Y(t)| > 0$ for non degenerated Gaussian process $G$. – Emile Oct 19 '15 at 17:33
  • @Emile: how is it that $d(X,X)\neq 0$ using definition $(*)$? – Zen Oct 21 '15 at 19:16
  • 1
    @Zen, because what OP wants is to compare is the pdf, like the KL does, and not the realisations (if I understood correctly). $(*)$ is defined for the realisations and not the pdf. If you modify $(*)$ to work on the pdf of the GP, you get what I said in the previous comment. – Emile Oct 21 '15 at 19:44
  • Hi, Emile. The OP wants to compare two **stochastic processes**. The $d$ defined in $(*)$ defines such a metric. We are not comparing realizations. Please, notice the expectation in the definition of $d$. Hence, $d$ does take fully into account the distribution of the processes. – Zen Oct 22 '15 at 01:38