0

We have a multivariate normal vector $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, where $\mathbf{X} = \left[ \begin{matrix} X_1 \\ X_2 \\ \dot\\\\ \dot\\\\ X_n \end{matrix} \right]$,

$\boldsymbol{\mu} = \left[ \begin{matrix} \mu_1 \\ \mu_2 \\ \dot\\\\ \dot\\\\ \mu_n \end{matrix} \right]$, and $\boldsymbol{\Sigma} = \begin{bmatrix} \sigma_1^2 & \sigma_1\sigma_2 & \dots & \sigma_1\sigma_n \\ \sigma_2\sigma_1 & \sigma_2^2 & \dots & \sigma_2\sigma_n \\ \dot\\\\ \dot\\\\ \sigma_n\sigma_1 & \sigma_n\sigma_2 & \dots & \sigma_n^2 \end{bmatrix} $ is the covariance matrix.

Y is a random scalar of the form $Y = \beta_0 + \boldsymbol{\beta^TX} + \epsilon$, where $\epsilon \sim \mathcal{N}(0, \sigma_\epsilon^2)$

How can we go about deriving the joint probability density function(pdf) of the random vector $(Y, \boldsymbol{X^T})$?

The marginal multivariate pdf of $\boldsymbol{X}$ is given by

$ pdf(\boldsymbol{x}) = \frac{1}{\sqrt{(2\pi})^n. \begin{vmatrix} \boldsymbol{\Sigma} \end{vmatrix}}.exp\left[ \frac{-1}{2}\boldsymbol{(x-\mu)}^T \boldsymbol{\Sigma^{-1}}\boldsymbol{(x-\mu)} \right]$

The conditional probability density function of $Y \,\mid \, \boldsymbol{X=x}$ is given by

$pdf(y\mid \boldsymbol{X=x}) = \frac{1}{2\pi\sigma_\epsilon}.exp\left[ \frac{-1}{2\sigma_\epsilon^2}(y - \beta_0 - \boldsymbol{\beta^Tx})^2 \right] $

To compute the joint probability density function, we can use the fact that

$pdf(y, \boldsymbol{x})= pdf(y\mid \boldsymbol{X=x}).pdf(\boldsymbol{x})$

I am unable to do the witty manipulation required in the algebra to arrive at the result the joint distribution will have a probability density function of a normal distribution. All questions previously asked seem to address the problem of deriving the conditional density function given the joint probability density function, for example, Deriving the conditional distributions of a multivariate normal distribution

Edit:

Multiplying the pdfs, we get $pdf(y, \boldsymbol{x}) = \frac{1}{\sqrt{(2\pi})^n. \begin{vmatrix} \boldsymbol{\Sigma} \end{vmatrix}}. \frac{1}{2\pi\sigma_\epsilon}.exp\left[ \frac{-1}{2}\boldsymbol{(x-\mu)}^T \boldsymbol{\Sigma^{-1}}\boldsymbol{(x-\mu)} + \frac{-1}{2\sigma_\epsilon^2}(y - \beta_0 - \boldsymbol{\beta^Tx})^2 \right]$

Like the comment below mentions, most manipulations dealing with normal pdfs involve completing squares; but since this specific algebraic expression contains $\boldsymbol{x}$ in both terms within the exp term, I'm unsure on how to go about completing the squares.

Isaac
  • 3
  • 2

2 Answers2

1

If your question is read literally with the covariance of $X_i$ and $X_j$ equaling the product of the standard deviations of $X_i$ and $X_j$ for all $i$ and $j$, then the $X_k$'s are perfectly correlated random variables of the form $X_k = \sigma_k Z + \mu_k$ where $Z$ is a standard normal random variable and thus $\mathbf X$ does not have a multivariate normal density; all the probability mass lies on a straight line instead of being spread out over all of $n$-dimensional space as described by your formula

$$ pdf(\boldsymbol{x}) = \frac{1}{\sqrt{(2\pi})^n. \begin{vmatrix} \boldsymbol{\Sigma} \end{vmatrix}}.exp\left[ \frac{-1}{2}\boldsymbol{(x-\mu)}^T \boldsymbol{\Sigma^{-1}}\boldsymbol{(x-\mu)} \right]$$

which assumes that $\Sigma$ is an invertible matrix. The literal reading of your $\Sigma$ gives that $$\Sigma = \left[ \begin{matrix} \sigma_1 \\ \sigma_2 \\ \vdots \\ \sigma_n \end{matrix} \right][\sigma_1, \sigma_2\ldots, \sigma_n]$$ which is not invertible.


"But, but, but," you protest vehemently, "I meant to write $\sigma_{i,j}$ for the covariance, not $\sigma_i\sigma_j$!!" Well, in that case, the calculation is a lot simpler. Assuming that $\varepsilon\sim N(0,\sigma^2)$ is independent of the $X_k$'s (something that you have not stated explicitly), then $\left[\begin{matrix}\varepsilon\\ \mathbf X\end{matrix}\right]$ is a column vector of $n+1$ jointly normal random variables with mean vector $\left[\begin{matrix}0\\ \boldsymbol{\mu}\end{matrix}\right]$ and covariance matrix $\Sigma_1 = \left[\begin{matrix}\sigma^2 & \mathbf{0}^T\\ \mathbf 0 & \Sigma\end{matrix}\right]$ where $\mathbf 0$ is a column vector of $n$ zeroes. Then, $$\left[\begin{matrix}Y^\prime\\ \mathbf X\end{matrix}\right] = \left[\begin{matrix}\varepsilon + \boldsymbol{\beta}^T \mathbf X\\ \mathbf X\end{matrix}\right] = \left[\begin{matrix}1 & \boldsymbol{\beta}^T \\ \mathbf 0 & \mathbf I\end{matrix}\right]\left[\begin{matrix}\varepsilon\\ \mathbf X\end{matrix}\right]$$ is the result of a linear transformation of jointly normal random variables and hence has a jointly normal distribution with mean vector $\left[\begin{matrix}\boldsymbol{\beta}^T\boldsymbol{\mu}\\ \boldsymbol{\mu}\end{matrix}\right]$ and covariance matrix $$\Sigma_2 = \left[\begin{matrix}1 & \boldsymbol{\beta}^T \\ \mathbf 0 & \mathbf I\end{matrix}\right]\left[\begin{matrix}\sigma^2 & \mathbf{0}^T\\ \mathbf 0 & \Sigma\end{matrix}\right]\left[\begin{matrix}1 & \boldsymbol{0}^T \\ \boldsymbol{\beta} & \mathbf I\end{matrix}\right] = \left[\begin{matrix}\sigma^2 + \boldsymbol{\beta}^T\Sigma\boldsymbol{\beta} & \boldsymbol{\beta}^T\Sigma\\ \Sigma\boldsymbol{\beta} & \Sigma\end{matrix}\right].$$ Note that there is no requirement that $\Sigma^2$ be invertible and that these $n+1$ jointly normal random variables have a $(n+1)$-variate joint normal density. I will leave it to the OP to add in that $\beta_0$ if he likes. The covariance matrix is that exhibited above, and the mean vector is slightly changed....

I hate column vectors...

Dilip Sarwate
  • 41,202
  • 4
  • 94
  • 200
0

Let's focus on inside of the exp. term because that's what we need to prove joint normality: (And note that $\mu_y=\beta_0+\beta^T\mu$): ($exp(-\frac{A}{2})$)

$$\begin{align}A & = (x-\mu)^T\Sigma^{-1}(x-\mu)+\frac{1}{\sigma_c^2}((y-\mu_y)-\beta^T(x-\mu))^2 \\ & = \frac{1}{\sigma_c^2}(y-\mu_y)^2 +(x-\mu)^T(\Sigma^{-1}+\frac{1}{\sigma_c^2}\beta\beta^T)(x-\mu) -\frac{1}{\sigma_c^2}2(y-\mu_y)\beta^T(x-\mu) \\ & =\begin{bmatrix}(x-\mu)^T &y-\mu_y\end{bmatrix}\begin{bmatrix}\Sigma^{-1}+\beta\beta^T/\sigma_c^2 & - \beta/\sigma_c^2 \\ -\beta^T/\sigma_c^2 & 1/\sigma_c^2 \end{bmatrix}\begin{bmatrix}x-\mu \\ y-\mu_y\end{bmatrix}\end{align}$$ And, we're done. The inner matrix is inverse of covariance matrix.

Using block-matrix inversion $\Sigma_{xy}$ turns out to be $\begin{bmatrix}\Sigma & \Sigma\beta \\ \beta^T\Sigma^T & \sigma_c^2+\beta^T\Sigma\beta\end{bmatrix}$, which seems plausible.

P.S. $2\pi$ in the denominator $p(y)$ will be $\sqrt{2\pi}$, and the covariance matrix of $x$ will contain off-diagonal entries $\sigma_{ij}$ instead of $\sigma_i\sigma_j$ I guess, otherwise elements of $x$ will have linear relationship and the covariance matrix will be singular.

gunes
  • 49,700
  • 3
  • 39
  • 75