Linear transformation of a random variable by a tall rectangular matrix

Question

Let's say we have a random vector $\vec{X} \in \mathbb{R}^n$, drawn from a distribution with probability density function $f_\vec{X}(\vec{x})$. If we linearly transform it by a full-rank $n \times n$ matrix $A$ to get $\vec{Y} = A\vec{X}$, then the density of $\vec{Y}$ is given by $$ f_{\vec{Y}}(\vec{y}) = \frac{1}{\left|\det A\right|}f_{\vec{X}}(A^{-1}\vec{y}). $$

Now say we transform $\vec{X}$ instead by an $m \times n$ matrix $B$, with $m > n$, giving $\vec{Z} = B\vec{X}$. Clearly $Z \in \mathbb{R}^m$, but it "lives on" an $n$-dimensional subspace $G \subset \mathbb{R}^m$. What is the conditional density of $\vec{Z}$, given that we know it lies in $G$?

My first instinct was to use the pseudo-inverse of $B$. If $B = U S V^T$ is the singular value decomposition of $B$, then $B^+ = V S^+ U^T$ is the pseudo-inverse, where $S^+$ is formed by inverting the non-zero entries of the diagonal matrix $S$. I guessed that this would give $$ f_\vec{Z}(\vec{z}) = \frac{1}{\left|\det^+ S\right|} f_\vec{X}(B^+ \vec{z}), $$ where by $\det^+ S$ I mean the product of the non-zero singular values.

This reasoning agrees with the density for a singular normal (conditioned on knowledge that the variable lives on the appropriate subspace) given here and mentioned also here and in this CrossValidated post.

But it isn't right! The normalization constant is off. A (trivial) counterexample is given by considering the following case: With $X \sim \mathcal{N(0, 1)}$, let $$ \vec{Y} = \begin{pmatrix}1 \\ 1\end{pmatrix} X = \begin{pmatrix}X \\ X\end{pmatrix}. $$ Here the matrix $B$ from above is just the ones vector. Its pseudo-inverse is $$ B^+ = \begin{pmatrix}1/2 & 1/2\end{pmatrix} $$ and $\det^+ B = \sqrt{2}$. The reasoning from above would suggest $$ f_\vec{Y}(\vec{y}) = \frac{1}{\sqrt{2\pi}\sqrt{2}}\exp\left(-\frac{1}{2}\vec{y}^T (B^+)^T B^+ \vec{y}\right), $$ but this in fact integrates (on the line $y = x$) to $\frac{1}{\sqrt{2}}$. I realize in this case you could just drop one of the entries of $\vec{Y}$ you're done, but when $B$ is much larger identifying the set of entries to drop is annoying. Why doesn't the pseudo-inverse reasoning work? Is there a general formula for the density function of a linear transformation of a set of random variables by a "tall" matrix? Any references would be greatly appreciated as well.

score 4 · Answer 1 · answered Jan 31 '15 at 19:37

For those who might run across this in the future... the source of the error actually stems from the integration. In the example above, integration takes place over the line $y = x$. It is therefore necessary to "parametrize" the line and consider the Jacobian of the parametrization when taking the integral, since each unit step in the $x$-axis corresponds to steps of length $\sqrt{2}$ on the line. The parametrization I was implicitly using was given by $x \mapsto (x, x)$, in other words specifying both identical entries of $\vec{y}$ by value. This has Jacobian $\sqrt{2}$, which neatly cancels with the $\sqrt{2}$ (coming from exactly the same Jacobian) in the denominator.

The example was artificially simple — for a general transformation $B$, one may have another parametrization for the output that is natural in the context of the problem. Since the parametrization needs to cover the same subspace $G$ as $B$, and this subspace is a hyperplane, the parameterization is itself likely to be linear. Calling the $m \times n$ matrix representation of the parametrization $L$, the requirement is simply that it have the same column space as $B$ (cover the same hyperplane). Then the final density becomes $$ f_{\vec{Z}}(\vec{z}) = \frac{\left|\det^+ L\right|}{\left|\det^+ B\right|}f_{\vec{X}}(B^+ \vec{z}). $$

In general, this setup is kind of odd, and I think the right thing to do is to find a maximal linearly independent set of rows of $B$, and remove the rest of the rows (along with the corresponding components of the transformed variable $\vec{z}$) to get a square matrix $\hat B$. Then the problem reduces to the full-rank $n \times n$ case (assuming $B$ has full column rank).

Linear transformation of a random variable by a tall rectangular matrix

1 Answers1

Linked