How to do SVD instead of Cholesky for $L^{T}L$?

Question

The Cholesky decomposition can be used to obtain $A$ from $X = AA^{T}$ (lower triangular version) but also $B$ from $Y = B^{T}B$ (upper triangular version). The SVD can be used to do something similar to the lower triangular Cholesky decompositionas as described here; e.g. obtaining $\mathbb{V}\mathbb{D}$ from $\mathbb{C} = \mathbb{V}\mathbb{D}^{2}\mathbb{V}'$. But how can it be adapted to something similar to the upper triangular Cholesky decomposition?

So the matrices $V$ and $VD$ aren't going to be either lower-, or upper- triangular matrices, so its not 100% clear to me what you mean. Cholesky and SVD are both two ways to take a matrix and decompose it in a way that might be useful, and will be used in different circumstances. For example, SVD is commonly used in principal component analysis while Cholesky is often used for solving numerical systems. You're right that the Cholesky decomposition has upper-triangular and lower-triangular versions, but there is only one version of SVD. Maybe a little more about what you're trying to do? — measure_theory, Oct 07 '16 at 13:28
@measure_theory See this post: http://stats.stackexchange.com/questions/238938/how-to-apply-the-svd-based-approach-to-sample-from-a-matrix-variate-normal-with for exactly where the problem occurs. It seems to matter if one decomposes like $X = AA^{T}$ or $X = A^{T}A$. — baf84b4c, Oct 07 '16 at 14:01

Matthew Gunn · Accepted Answer · 2016-10-07T16:06:50.990

3

Theory:

Let $\mathbf{z}$ be a random vector with mean zero and $\mathrm{Var}\left(\mathbf{z} \right) = I$. Let $\Sigma$ be the desired covariance matrix.

For any matrix $A$ such that $AA' = \Sigma$

$$\mathrm{Var}\left( A\mathbf{z} \right) = A\mathrm{Var}(\mathbf{z})A' = \Sigma $$

There are numerous ways to obtain such an A. Some commonly used approaches are:

Cholesky decomposition
Singular value decomposition (useful in case of rank deficient $\Sigma$).

Obtaining $A$ such that $AA'=\Sigma$ using Lower Triangular Cholesky

If $\Sigma$ is positive definite you can compute the Cholesky decomposition.

L = chol(Sigma,'lower')

Then $LL' = \Sigma$ and we can choose $A = L$.

Obtaining $A$ such that $AA'=\Sigma$ using Upper Triangular Cholesky

R = chol(Sigma,'upper') Then $R'R = \Sigma$ and we can choose $A = R'$.

Obtaining $A$ such that $AA'=\Sigma$ using Singular Value Decomposition

[U, S, V] = svd(Sigma)

A singular value decomposition on a matrix $\Sigma$ will yield matrices $U$, $S$, $V$ Such that $USV'=\Sigma$ and $S$ is diagonal. Furthermore, if $\Sigma$ is symmetric, we have $U = V$. Choose: $$ A = U \sqrt{S}$$ Then $AA' = U\sqrt{S}\sqrt{S}U'= USU' = USV' = \Sigma$

Let $n$ be the number of random vectors we wish to generate, let $k$ be the dimension of the vector, and let randn be a function which generates an $n$ by $k$ matrix representing $n\cdot k$ iid draws of variance 1 random variables.

A = U*sqrt(S)
X = randn(n, k) * A'

And X will be an $n$ by $k$ matrix with cov(X) $\approx \Sigma$ where cov computes the sample covariance matrix taking each row as an observation.

What if I have $B$ such that $B'B=\Sigma$?

This is not a meaningfully different case. Let $A = B'$ then $AA'=\Sigma$.

edited Oct 07 '16 at 16:06

answered Oct 07 '16 at 13:32

Matthew Gunn

20,541
1
47
85

The question is asking for the $L'L$ case and you are describing the $LL'$ case. Have a look at the two posts mentioned at the end of the question. – baf84b4c Oct 07 '16 at 14:04
@baf84b4c If $AA'=\Sigma$, choose $B = A'$, then $B'B=\Sigma$. That's not a meaningfully different case. – Matthew Gunn Oct 07 '16 at 14:07
But it does seem to matter: $Y = L'L$ and $X = LL'$ yield two different symmetric matrices $X$ and $Y$. If one wants to reverse this process and recover $L$ by a matrix square root operation, e.g. Cholesky, then it does matter which way the initial matrix was constructed, e.g. $X$ or $Y$. So I'm interested in recovering $L$ from $Y$, just that in my case I can't use Cholesky and must use SVD. I might not get the exact same $L$ as in Cholesky, but I should get something that satisfied the same constraints. Maybe it'S not possible to do via SVD because it can't distinguish the two cases? – baf84b4c Oct 07 '16 at 14:15
2

@baf84b4c There are *two* types of Cholesky decompositions, upper triangular and lower triangular. Either may be used. There is only ONE type of singular value decomposition. And the singular value decomposition does NOT give you the Cholesky decomposition. – Matthew Gunn Oct 07 '16 at 14:19
But using the SVD doesn't seem to work and my guess was because of this destinction of $L'L$ and $LL'$, see: http://stats.stackexchange.com/questions/238938/how-to-apply-the-svd-based-approach-to-sample-from-a-matrix-variate-normal-with. All I want to do is sample from the matrix-variate normal according to how it is written on the wiki entry, https://en.wikipedia.org/wiki/Matrix_normal_distribution#Drawing_values_from_the_distribution – baf84b4c Oct 07 '16 at 14:22
@baf84b4c Your code is bugged. `X = np.random.standard_normal((100000,4)).dot(np.diag(np.sqrt(sA)).dot(uA.transpose())).transpose()` then `np.cov(X)` will match your `AAt` – Matthew Gunn Oct 07 '16 at 14:48
Have a look at the wikipedia entry, https://en.wikipedia.org/wiki/Matrix_normal_distribution#Drawing_values_from_the_distribution. While I'm sampling 10000 times, the dimension of each matrix will be (4, 5) in the code example, since a matrix-variate normal sample is different than a multivariate normal sample. Or am I missing something? How would you even construct something that has a covariance matrix like BtB with dimensions (5, 5), if you have sampled X at (100000,4) to match with AAt at dimensions (4, 4)? – baf84b4c Oct 07 '16 at 15:12
@baf84bc Ahhhh... you're trying to do random matrices. I haven't done random matrix stuff before, but the basic approach is going to be represent your matrix as a vector then reshape it at the end.Eg. you can write matrix $\begin{bmatrix} x_{11} & x_{12}\\ x_{21} & x_{22}\end{bmatrix}$ as vector $\mathbf{y} = \begin{bmatrix}x_{11}\\x_{12}\\x_{21}\\x_{22}\end{bmatrix}$. Now you're in the classic random vector case! So the tricky parts would be figuring out what the covariance matrix of $\mathbf{y}$ should be and then translating an array of vectors $\mathbf{y}$ back to an array of matrices. – Matthew Gunn Oct 07 '16 at 15:19
In that Wikipedia article, $\mathrm{Vec}\left( \begin{bmatrix} x_{11} & x_{12}\\ x_{21} & x_{22}\end{bmatrix}\right) = \begin{bmatrix}x_{11}\\x_{12}\\x_{21}\\x_{22}\end{bmatrix}$. You can compute covariance matrix $\Sigma$ for $\mathbf{y}$ using the Kroenecker product of $V$ and $U$. Now you're in random vector case. Do the standard code like I posted in the comments. Then reshape your vectors $y$ back into matrices at the end. – Matthew Gunn Oct 07 '16 at 15:25
I can't use the Kroenecker product, since it doesn't scale well to matrices on the order of 10000 x 10000. I'm interested in actually sampling in the way that is suggested in the wiki post, e.g. via Cholesky decomposition or some other matrix square root operation on the covariance matrices. – baf84b4c Oct 07 '16 at 15:30
@baf84b4c If your matrices are on the order of $10,000$ by $10,000$, then in the general case you have $10,000^2$ = 100 million covariance terms! You have a covariance term for every entry $x_{ij}$ with every entry $x_{kl}$! The only way you can handle that is if there's massive sparsity and you're extremely careful and efficient about maximally exploiting sparsity. This may be doable for sparse U and V but it doesn't sound super simple. I'd either sit down with paper and really carefully work out the math and/or get some help from someone who really knows matrix math, linear algebra etc.. – Matthew Gunn Oct 07 '16 at 15:45
1

I'm just simulating the covariance matrices not estimating them. And yes, they are very sparse (block structured). All I'm doing are dot products, apart from the SVD which also only takes on the order of minutes for that size. Memory is only a concern if I want to compute the Kronecker product, which I don't. – baf84b4c Oct 07 '16 at 15:46
@baf84b4c AHHHHHH, ok, I finally understand what you're doing now. Yeah, so your basic approach and jist of your code sounds reasonable to me! I'm not a terribly proficient python programer and have other work to do (so I don't know what's going wrong... or if anything is actually going wrong at all). – Matthew Gunn Oct 07 '16 at 15:54
@baf84b4c wait.... I don't understand why there's a Y1 and a Y2? Following the Wikipedia article, shouldn't it be $Y = AXB$?! blah. sorry, but I gotta move on. – Matthew Gunn Oct 07 '16 at 15:57
@ The $Y_{1}$ and $Y_{2}$ are just $X$ after the application of $Y_{1} = AX$ and $Y_{2} = Y_{1}B$, so $AXB$ in two steps. Thanks! – baf84b4c Oct 07 '16 at 15:59
@baf84b4c good luck. I think you're close – Matthew Gunn Oct 07 '16 at 16:00