8

Suppose we have two multivariate random variables $\mathbf{X}$ (of dimension $n_x$) and $\mathbf{Y}$ (of dimension $n_y$). The covariance matrix $C_{X,Y}$ can be written as the following block-matrix form: $$ \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \\ \end{bmatrix}, $$ where $\Sigma_{11}$ is the covariance of $\mathbf{X}$.

According to here, the conditional covariance matrix $C_{Y|X}$ can be expressed as:

$$ C_{Y|X}=\Sigma_{22}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12} $$

My question is: how to derive the equality?

wdg
  • 271
  • 2
  • 6
  • Repeat of http://stats.stackexchange.com/questions/30588/deriving-the-conditional-distributions-of-a-multivariate-normal-distribution . Also see Section 3,2 of http://www.maths.manchester.ac.uk/~mkt/MT3732%20%28MVA%29/Notes/MVA_Section3.pdf . This is also derived in several books, for example, pp. 33-34 of https://books.google.com/books?id=FtHgBwAAQBAJ&pg=PP3&dq=Tong+The+Multivariate+Normal+Distribution+springer&hl=en&sa=X&ved=0ahUKEwiNrqGtxdbJAhXJLSYKHVM9BW8Q6AEIHDAA#v=onepage&q=Tong%20The%20Multivariate%20Normal%20Distribution%20springer&f=false , which is not in the preview. – Mark L. Stone Dec 12 '15 at 14:34
  • @MarkL.Stone this is not a repeated question. The link you gave is about Normal distribution. But this is not. – Albert Chen Feb 19 '19 at 15:52
  • 1
    @Albert Chen. Normality assumptions is not required in https://stats.stackexchange.com/questions/30588/deriving-the-conditional-distributions-of-a-multivariate-normal-distribution in the derivations of the formula for conditional covariance in the answers by Macro and Ben. – Mark L. Stone Feb 19 '19 at 16:42
  • I know this question is rather old, but if it's still good for anything, I think you'd find Chapter 10 from Johnson and Wichern's (1992) Applied Multivariate Statistical Analysis most useful. The chapter's title is "Canonical Correlation Analysis" – Jxson99 Feb 17 '20 at 14:44
  • Does this answer your question? [Deriving the conditional distributions of a multivariate normal distribution](https://stats.stackexchange.com/questions/30588/deriving-the-conditional-distributions-of-a-multivariate-normal-distribution) – kjetil b halvorsen Feb 20 '20 at 19:56
  • Please see the link https://online.stat.psu.edu/stat505/lesson/6/6.1 which can help you to understand the derivation of the conditional covariance. – 段钰泽 Jun 29 '21 at 03:36

1 Answers1

3

This rule holds when the random variables are jointly normally distributed, but it does not apply more generally; i.e., for other joint distributions it might not hold. ​In a related answer here it is shown that the Mahanalobis distance can be decomposed as follows:

$$\begin{equation} \begin{aligned} D^2 (\boldsymbol{x}, \boldsymbol{y}) &= \begin{bmatrix} \boldsymbol{x} - \boldsymbol{\mu}_X \\ \boldsymbol{y} - \boldsymbol{\mu}_Y \end{bmatrix}^\text{T} \begin{bmatrix} \boldsymbol{\Sigma}_{XX} & \boldsymbol{\Sigma}_{XY} \\ \boldsymbol{\Sigma}_{YX} & \boldsymbol{\Sigma}_{YY} \end{bmatrix}^{-1} \begin{bmatrix} \boldsymbol{x} - \boldsymbol{\mu}_X \\ \boldsymbol{y} - \boldsymbol{\mu}_Y \end{bmatrix} \\[6pt] &= \underbrace{(\boldsymbol{y} - \boldsymbol{\mu}_{Y|X})^\text{T} \boldsymbol{\Sigma}_{Y|X}^{-1} (\boldsymbol{y} - \boldsymbol{\mu}_{Y|X})}_\text{Conditional Part} + \underbrace{(\boldsymbol{x} - \boldsymbol{\mu}_X)^\text{T} \boldsymbol{\Sigma}_{XX}^{-1} (\boldsymbol{x} - \boldsymbol{\mu}_X)}_\text{Marginal Part}, \\[6pt] \end{aligned} \end{equation}$$

where we use the conditional mean vector and conditional variance matrix:

$$\begin{align} \boldsymbol{\mu}_{Y|X} &\equiv \boldsymbol{\mu}_Y + \boldsymbol{\Sigma}_{YX} \boldsymbol{\Sigma}_{XX}^{-1} (\boldsymbol{x} - \boldsymbol{\mu}_X), \\[6pt] \boldsymbol{\Sigma}_{Y|X} \ &\equiv \boldsymbol{\Sigma}_{YY} - \boldsymbol{\Sigma}_{YX} \boldsymbol{\Sigma}_{XX}^{-1} \boldsymbol{\Sigma}_{XY}. \\[6pt] \end{align}$$

If the random vectors $\mathbf{X}$ and $\mathbf{Y}$ are jointly normally distributed, it follows that the conditional distribution of interest can be written as:

$$\begin{equation} \begin{aligned} p(\boldsymbol{y} | \boldsymbol{x}, \boldsymbol{\mu}, \boldsymbol{\Sigma}) &\overset{\boldsymbol{y}}{\propto} p(\boldsymbol{x} , \boldsymbol{y} | \boldsymbol{\mu}, \boldsymbol{\Sigma}) \\[12pt] &= \text{N}(\boldsymbol{x}, \boldsymbol{y} | \boldsymbol{\mu}, \boldsymbol{\Sigma}) \\[10pt] &\overset{\boldsymbol{y}}{\propto} \exp \Big( - \frac{1}{2} D^2 (\boldsymbol{x}, \boldsymbol{y}) \Big) \\[6pt] &\overset{\boldsymbol{y}}{\propto} \exp \Big( - \frac{1}{2} (\boldsymbol{y} - \boldsymbol{\mu}_{Y|X})^\text{T} \boldsymbol{\Sigma}_{Y|X}^{-1} (\boldsymbol{y} - \boldsymbol{\mu}_{Y|X}) \Big) \\[6pt] &\overset{\boldsymbol{y}}{\propto}\text{N}(\boldsymbol{y} | \boldsymbol{\mu}_{Y|X}, \boldsymbol{\Sigma}_{Y|X}), \\[6pt] \end{aligned} \end{equation}$$

which establishes that $\boldsymbol{\Sigma}_{Y|X}$ is the conditional covariance matrix. Note again that this result depends on the assumption that the random vectors are jointly normally distributed. It can be regarded as a "first-order" approximation to the conditional covariance in other cases.

Ben
  • 91,027
  • 3
  • 150
  • 376
  • Thanks for your answer. However, this question is no longer my priority and I need time to review it. – wdg Aug 05 '21 at 02:58