Deriving the joint probability density function from a given marginal density function and conditional density function

Question

We have a multivariate normal vector $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, where $\mathbf{X} = \left[ \begin{matrix} X_1 \\ X_2 \\ \dot\\\\ \dot\\\\ X_n \end{matrix} \right]$,

$\boldsymbol{\mu} = \left[ \begin{matrix} \mu_1 \\ \mu_2 \\ \dot\\\\ \dot\\\\ \mu_n \end{matrix} \right]$, and $\boldsymbol{\Sigma} = \begin{bmatrix} \sigma_1^2 & \sigma_1\sigma_2 & \dots & \sigma_1\sigma_n \\ \sigma_2\sigma_1 & \sigma_2^2 & \dots & \sigma_2\sigma_n \\ \dot\\\\ \dot\\\\ \sigma_n\sigma_1 & \sigma_n\sigma_2 & \dots & \sigma_n^2 \end{bmatrix} $ is the covariance matrix.

Y is a random scalar of the form $Y = \beta_0 + \boldsymbol{\beta^TX} + \epsilon$, where $\epsilon \sim \mathcal{N}(0, \sigma_\epsilon^2)$

How can we go about deriving the joint probability density function(pdf) of the random vector $(Y, \boldsymbol{X^T})$?

The marginal multivariate pdf of $\boldsymbol{X}$ is given by

$ pdf(\boldsymbol{x}) = \frac{1}{\sqrt{(2\pi})^n. \begin{vmatrix} \boldsymbol{\Sigma} \end{vmatrix}}.exp\left[ \frac{-1}{2}\boldsymbol{(x-\mu)}^T \boldsymbol{\Sigma^{-1}}\boldsymbol{(x-\mu)} \right]$

The conditional probability density function of $Y \,\mid \, \boldsymbol{X=x}$ is given by

$pdf(y\mid \boldsymbol{X=x}) = \frac{1}{2\pi\sigma_\epsilon}.exp\left[ \frac{-1}{2\sigma_\epsilon^2}(y - \beta_0 - \boldsymbol{\beta^Tx})^2 \right] $

To compute the joint probability density function, we can use the fact that

$pdf(y, \boldsymbol{x})= pdf(y\mid \boldsymbol{X=x}).pdf(\boldsymbol{x})$

I am unable to do the witty manipulation required in the algebra to arrive at the result the joint distribution will have a probability density function of a normal distribution. All questions previously asked seem to address the problem of deriving the conditional density function given the joint probability density function, for example, Deriving the conditional distributions of a multivariate normal distribution

Edit:

Multiplying the pdfs, we get $pdf(y, \boldsymbol{x}) = \frac{1}{\sqrt{(2\pi})^n. \begin{vmatrix} \boldsymbol{\Sigma} \end{vmatrix}}. \frac{1}{2\pi\sigma_\epsilon}.exp\left[ \frac{-1}{2}\boldsymbol{(x-\mu)}^T \boldsymbol{\Sigma^{-1}}\boldsymbol{(x-\mu)} + \frac{-1}{2\sigma_\epsilon^2}(y - \beta_0 - \boldsymbol{\beta^Tx})^2 \right]$

Like the comment below mentions, most manipulations dealing with normal pdfs involve completing squares; but since this specific algebraic expression contains $\boldsymbol{x}$ in both terms within the exp term, I'm unsure on how to go about completing the squares.

I've tried this, but I'm not able to ascertain the term which when added and subtracted will complete the new square — Isaac, Jan 09 '19 at 16:50

score 1 · Answer 1 · answered Jan 09 '19 at 22:14

If your question is read literally with the covariance of $X_i$ and $X_j$ equaling the product of the standard deviations of $X_i$ and $X_j$ for all $i$ and $j$, then the $X_k$'s are perfectly correlated random variables of the form $X_k = \sigma_k Z + \mu_k$ where $Z$ is a standard normal random variable and thus $\mathbf X$ does not have a multivariate normal density; all the probability mass lies on a straight line instead of being spread out over all of $n$-dimensional space as described by your formula

$$ pdf(\boldsymbol{x}) = \frac{1}{\sqrt{(2\pi})^n. \begin{vmatrix} \boldsymbol{\Sigma} \end{vmatrix}}.exp\left[ \frac{-1}{2}\boldsymbol{(x-\mu)}^T \boldsymbol{\Sigma^{-1}}\boldsymbol{(x-\mu)} \right]$$

which assumes that $\Sigma$ is an invertible matrix. The literal reading of your $\Sigma$ gives that $$\Sigma = \left[ \begin{matrix} \sigma_1 \\ \sigma_2 \\ \vdots \\ \sigma_n \end{matrix} \right][\sigma_1, \sigma_2\ldots, \sigma_n]$$ which is not invertible.

"But, but, but," you protest vehemently, "I meant to write $\sigma_{i,j}$ for the covariance, not $\sigma_i\sigma_j$!!" Well, in that case, the calculation is a lot simpler. Assuming that $\varepsilon\sim N(0,\sigma^2)$ is independent of the $X_k$'s (something that you have not stated explicitly), then $\left[\begin{matrix}\varepsilon\\ \mathbf X\end{matrix}\right]$ is a column vector of $n+1$ jointly normal random variables with mean vector $\left[\begin{matrix}0\\ \boldsymbol{\mu}\end{matrix}\right]$ and covariance matrix $\Sigma_1 = \left[\begin{matrix}\sigma^2 & \mathbf{0}^T\\ \mathbf 0 & \Sigma\end{matrix}\right]$ where $\mathbf 0$ is a column vector of $n$ zeroes. Then, $$\left[\begin{matrix}Y^\prime\\ \mathbf X\end{matrix}\right] = \left[\begin{matrix}\varepsilon + \boldsymbol{\beta}^T \mathbf X\\ \mathbf X\end{matrix}\right] = \left[\begin{matrix}1 & \boldsymbol{\beta}^T \\ \mathbf 0 & \mathbf I\end{matrix}\right]\left[\begin{matrix}\varepsilon\\ \mathbf X\end{matrix}\right]$$ is the result of a linear transformation of jointly normal random variables and hence has a jointly normal distribution with mean vector $\left[\begin{matrix}\boldsymbol{\beta}^T\boldsymbol{\mu}\\ \boldsymbol{\mu}\end{matrix}\right]$ and covariance matrix $$\Sigma_2 = \left[\begin{matrix}1 & \boldsymbol{\beta}^T \\ \mathbf 0 & \mathbf I\end{matrix}\right]\left[\begin{matrix}\sigma^2 & \mathbf{0}^T\\ \mathbf 0 & \Sigma\end{matrix}\right]\left[\begin{matrix}1 & \boldsymbol{0}^T \\ \boldsymbol{\beta} & \mathbf I\end{matrix}\right] = \left[\begin{matrix}\sigma^2 + \boldsymbol{\beta}^T\Sigma\boldsymbol{\beta} & \boldsymbol{\beta}^T\Sigma\\ \Sigma\boldsymbol{\beta} & \Sigma\end{matrix}\right].$$ Note that there is no requirement that $\Sigma^2$ be invertible and that these $n+1$ jointly normal random variables have a $(n+1)$-variate joint normal density. I will leave it to the OP to add in that $\beta_0$ if he likes. The covariance matrix is that exhibited above, and the mean vector is slightly changed....

I hate column vectors...

gunes · Accepted Answer · 2019-01-10T09:58:25.523

Let's focus on inside of the exp. term because that's what we need to prove joint normality: (And note that $\mu_y=\beta_0+\beta^T\mu$): ($exp(-\frac{A}{2})$)

$$\begin{align}A & = (x-\mu)^T\Sigma^{-1}(x-\mu)+\frac{1}{\sigma_c^2}((y-\mu_y)-\beta^T(x-\mu))^2 \\ & = \frac{1}{\sigma_c^2}(y-\mu_y)^2 +(x-\mu)^T(\Sigma^{-1}+\frac{1}{\sigma_c^2}\beta\beta^T)(x-\mu) -\frac{1}{\sigma_c^2}2(y-\mu_y)\beta^T(x-\mu) \\ & =\begin{bmatrix}(x-\mu)^T &y-\mu_y\end{bmatrix}\begin{bmatrix}\Sigma^{-1}+\beta\beta^T/\sigma_c^2 & - \beta/\sigma_c^2 \\ -\beta^T/\sigma_c^2 & 1/\sigma_c^2 \end{bmatrix}\begin{bmatrix}x-\mu \\ y-\mu_y\end{bmatrix}\end{align}$$ And, we're done. The inner matrix is inverse of covariance matrix.

Using block-matrix inversion $\Sigma_{xy}$ turns out to be $\begin{bmatrix}\Sigma & \Sigma\beta \\ \beta^T\Sigma^T & \sigma_c^2+\beta^T\Sigma\beta\end{bmatrix}$, which seems plausible.

P.S. $2\pi$ in the denominator $p(y)$ will be $\sqrt{2\pi}$, and the covariance matrix of $x$ will contain off-diagonal entries $\sigma_{ij}$ instead of $\sigma_i\sigma_j$ I guess, otherwise elements of $x$ will have linear relationship and the covariance matrix will be singular.

Deriving the joint probability density function from a given marginal density function and conditional density function

2 Answers2

Linked

Related