Conceptual proof that conditional of a multivariate Gaussian is multivariate Gaussian

Question

I understand the arithmetic derivation of the PDF of a conditional distribution of a multivariate Gaussian, as explained here, for example. Does anyone know of a more conceptual (perhaps, co-ordinate free) proof of the same result, perhaps one that uses characterising properties of the Gaussian?

score 9 · Accepted Answer · edited Jun 11 '20 at 14:32

A multivariate Gaussian (or Normal) random variable $X=(X_1,X_2,\ldots,X_n)$ can be defined as an affine transformation of a tuple of independent standard Normal variates $Z=(Z_1,Z_2,\ldots, Z_m)$. This easily implies the desired result, because when we condition $X$, we impose linear constraints among the $Z_j$. (If this is not obvious, please read on through the details.) This merely reduces the number of "free" $Z_j$ contributing to the variation among the $X_i$--but those $X_i$ nevertheless remain affine combinations of independent standard Normals, QED.

We can obtain this result in three steps of increasing generality. First, the distribution of $X$ conditional on its first component is Normal. Second, this implies the distribution of $X$ conditional on some linear constraint $C^\prime X = d$ is Normal. Finally, that implies the distribution of $X$ conditional on any finite set of $r$ such linear constraints is Normal.

Details

By definition,

$$X = \mathbb{A} Z + B$$

for some $n\times m$ matrix $\mathbb{A} = (a_{ij})$ and $n$-vector $B = (b_1, b_2, \ldots, b_n)$. Because one affine followed by another is still an affine transformation, notice that any affine transformation of $X$ is therefore also Normal. This fact will be used repeatedly.

Fix a number $x_1$ in order to consider the distribution of $X$ conditional on $X_1=x_1$. Replacing $X_1$ by its definition produces

$$x_1 = X_1 = b_1 + a_{11}Z_1 + a_{12}Z_2 + \cdots + a_{1m}Z_m.$$

When all the $a_{1j}=0$, the two cases $x_1=b_1$ and $x_1\ne b_1$ are easy to dispose of, so let's move on to the alternative where, for at least one index $k$, $a_{1k}\ne 0$. Solving for $Z_k$ exhibits it as an affine combination of the remaining $Z_j,\, j\ne k$:

$$Z_k = \frac{1}{a_{1k}}\left(x_1 - b_1 - (a_{11}Z_1 + \cdots + a_{1,k-1} + a_{1,k+1} + \cdots + a_{1m}Z_m)\right).$$

Plugging this in to $\mathbb{A}Z + B$ produces an affine combination of the remaining $Z_j$, explicitly exhibiting the conditional distribution of $X$ as an affine combination of $m-1$ independent standard normal variates, whence the conditional distribution is Normal.

Now consider any vector $C=(c_1, c_2, \ldots, c_n)$ and another constant $d$. To obtain the conditional distribution of $X$ given $C^\prime X = d$, construct the $n+1$-vector

$$Y = (Y_1,Y_2,\ldots, Y_{n+1})=(C^\prime X, X_1, X_2, \ldots, X_n) + (d, b_1, b_2, \ldots, b_n).$$

It is an affine combination of the same $Z_j$: the matrix $\mathbb{A}$ is row-augmented (at the top) by $C^\prime \mathbb{A}$ (an $n+1\times m$ matrix) and the vector of means $B$ is augmented at the beginning by the constant $d$. Therefore, by definition, $Y$ is multivariate Normal. Applying the preceding result to $Y$ and $d$ immediately shows that $Y$, conditional on $Y_1 = d$, is multivariate Normal. Upon ignoring the first component of $Y$ (which is an affine transformation!), that is precisely the distribution of $X$ conditional on $C^\prime X = d$.

The distribution of $X$ conditional on $\mathbb{C}X = D$ for an $r\times n$ matrix $\mathbb{C}$ and an $r$-vector $D$ is obtained inductively by applying the preceding construction one term at a time (working row-by-row through $\mathbb{C}$ and component-by-component through $D$). The conditionals are Normal at every step, whence the final conditional distribution is Normal, too.

score 3 · Answer 2 · edited Apr 13 '17 at 12:44

The device used in the answer that you cite will also get you the conditional distribution. Here is a self-contained derivation with a slight change in notation.

Partition the column vector $X:=(X_1, X_2,\ldots, X_n)^T$ into subvectors $X_a$ and $X_b$: $$ X = \left(\begin{matrix}X_a\\X_b\end{matrix}\right) $$ and correspondingly partition the mean vector $\mu$ and covariance matrix $\Sigma$ of $X$: $$ \mu = \left(\begin{matrix}\mu_a\\ \mu_b\end{matrix}\right);\qquad \Sigma=\left(\begin{matrix}\Sigma_{a,a}&\Sigma_{a,b}\\\Sigma_{b,a}&\Sigma_{b,b}\end{matrix}\right)$$ The key is to find a matrix $C$ of constants such that $$Z:=X_a- C X_b\tag1$$ is uncorrelated with $X_b$; and since $Z$ and $X_b$ are both Gaussian, they are also independent. For $Z$ and $X_b$ to be uncorrelated we demand $$ 0= \operatorname{cov} (Z, X_b)=\operatorname{cov} (X_a - CX_b, X_b)=\Sigma_{a,b}-C\Sigma_{b,b}.\tag2 $$ Such a $C$ can always be found: If $\Sigma_{b,b}$ is invertible, then $$ C:=\Sigma_{a,b}\Sigma_{b,b}^{-1}\tag3$$ will do; otherwise you can take $\Sigma_{b,b}^{-1}$ to be the Moore-Penrose pseudoinverse of $\Sigma_{b,b}$.

Now the conditional distribution of $X_a$ given $X_b=x_b$ is easily obtained: $$P(X_a\in A\mid X_b=x_b)=P(Z+CX_b\in A\mid X_b=x_b) \stackrel{(*)}=P(Z+Cx_b\in A),\tag4 $$ where in (*) we use the fact that $Z$ and $X_b$ are independent. But $Z+Cx_b$ clearly has a Gaussian distribution, since it's an affine transformation of the original vector $X$... and we're done!

This same device gets you the conditional mean: $$\begin{align} E(X_a\mid X_b=x_b)&=E(Z + C X_b\mid X_b=x_b)\\ &=E(Z\mid X_b=x_b) + Cx_b\\ &\stackrel{(*)}=E(Z) + Cx_b\\ &= E(X_a)- CE(X_b) + C(x_b)\\ &= \mu_a + C(x_b - \mu_b) \end{align} $$ and the conditional variance: $$\begin{align} \operatorname{var}(X_a\mid X_b=x_b)&=\operatorname{var}(Z + C X_b\mid X_b=x_b)\\ &=\operatorname{var}(Z\mid X_b=x_b)\\ &\stackrel{(*)}=\operatorname{var}(Z)\\ &= \operatorname{cov}(Z, X_a-CX_b)\\ &=\operatorname{cov}(Z, X_a) - \underbrace{\operatorname{cov}(Z, X_b)}_0 C^T\\ &=\operatorname{cov}(X_a-CX_b, X_a)\\ &=\Sigma_{a,a}-C\Sigma_{b,a} \end{align} $$

Conceptual proof that conditional of a multivariate Gaussian is multivariate Gaussian

2 Answers2

Details