For example, in R
, the MASS::mvrnorm()
function is useful for generating data to demonstrate various things in statistics. It takes a mandatory Sigma
argument which is a symmetric matrix specifying the covariance matrix of the variables. How would I create a symmetric $n\times n$ matrix with arbitrary entries?

- 132,789
- 81
- 357
- 650

- 845
- 2
- 9
- 15
-
5I think this question would benefit from being edited to focus on "how can I create an arbitrary covariance matrix" and less on the coding aspect. There is certainly an on-topic underlying statistical issue here, as demonstrated by the answer. – Silverfish May 31 '16 at 22:23
-
3Related: [How to efficiently generate random positive-semidefinite correlation matrices?](http://stats.stackexchange.com/questions/2746) – amoeba Jun 02 '16 at 13:50
4 Answers
I like to have control over the objects I create, even when they might be arbitrary.
Consider, then, that all possible $n\times n$ covariance matrices $\Sigma$ can be expressed in the form
$$\Sigma= P^\prime\ \text{Diagonal}(\sigma_1,\sigma_2,\ldots, \sigma_n)\ P$$
where $P$ is an orthogonal matrix and $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_n \ge 0$.
Geometrically this describes a covariance structure with a range of principal components of sizes $\sigma_i$. These components point in the directions of the rows of $P$. See the figures at Making sense of principal component analysis, eigenvectors & eigenvalues for examples with $n=3$. Setting the $\sigma_i$ will set the magnitudes of the covariances and their relative sizes, thereby determining any desired ellipsoidal shape. The rows of $P$ orient the axes of the shape as you prefer.
One algebraic and computing benefit of this approach is that when $\sigma_n \gt 0$, $\Sigma$ is readily inverted (which is a common operation on covariance matrices):
$$\Sigma^{-1} = P^\prime\ \text{Diagonal}(1/\sigma_1, 1/\sigma_2, \ldots, 1/\sigma_n)\ P.$$
Don't care about the directions, but only about the ranges of sizes of the the $\sigma_i$? That's fine: you can easily generate a random orthogonal matrix. Just wrap $n^2$ iid standard Normal values into a square matrix and then orthogonalize it. It will almost surely work (provided $n$ isn't huge). The QR decomposition will do that, as in this code
n <- 5
p <- qr.Q(qr(matrix(rnorm(n^2), n)))
This works because the $n$-variate multinormal distribution so generated is "elliptical": it is invariant under all rotations and reflections (through the origin). Thus, all orthogonal matrices are generated uniformly, as argued at How to generate uniformly distributed points on the surface of the 3-d unit sphere?.
A quick way to obtain $\Sigma$ from $P$ and the $\sigma_i$, once you have specified or created them, uses crossprod
and exploits R
's re-use of arrays in arithmetic operations, as in this example with $\sigma=(\sigma_1, \ldots, \sigma_5) = (5,4,3,2,1)$:
Sigma <- crossprod(p, p*(5:1))
As a check, the Singular Value decomposition should return both $\sigma$ and $P^\prime$. You may inspect it with the command
svd(Sigma)
The inverse of Sigma
of course is obtained merely by changing the multiplication by $\sigma$ into a division:
Tau <- crossprod(p, p/(5:1))
You may verify this by viewing zapsmall(Sigma %*% Tau)
, which should be the $n\times n$ identity matrix. A generalized inverse (essential for regression calculations) is obtained by replacing any $\sigma_i \ne 0$ by $1/\sigma_i$, exactly as above, but keeping any zeros among the $\sigma_i$ as they were.
-
It might help to demonstrate how to use the rows of $P$ to orient the axes as preferred. – gung - Reinstate Monica May 31 '16 at 23:33
-
1Might be worth mentioning that the singular values in `svd(Sigma)` will be reordered -- that confused me for a minute. – FrankD Mar 28 '18 at 10:45
-
1@whuber could you edit please the line of code to parametrize n: crossprod(p, p*(n:1)). In order to improve reproducibility, good answer! – Cristóbal Alcázar Feb 02 '20 at 15:54
-
@CristóbalAlcázar Thank you for your comment. Because that's a specific example and is not intended for general purposes, I think it's best to keep the values hard-coded as they are. – whuber Feb 02 '20 at 16:37
Create an $n\times n$ matrix $A$ with arbitrary values
and then use $\Sigma = A^T A$ as your covariance matrix.
For example
n <- 4
A <- matrix(runif(n^2)*2-1, ncol=n)
Sigma <- t(A) %*% A

- 30,848
- 1
- 63
- 107
-
-
12@MoazzemHossen: Your suggestion will produce a symmetric matrix, but it may not always be positive semidefinite (e.g. your suggestion could produce a matrix with negative eigenvalues) and so it may not be suitable as a covariance matrix – Henry May 31 '16 at 10:30
-
Yes, I noticed that R returns error in the event my suggested way produced unsuitable matrix. – rsl May 31 '16 at 18:32
-
7Note that if you prefer a correlation matrix for better interpretability, there is the [?cov2cor](https://stat.ethz.ch/R-manual/R-patched/library/stats/html/cor.html) function, which can be applied subsequently. – gung - Reinstate Monica May 31 '16 at 22:58
-
1@B11b: You need your covariance matrix to be positive semi-definite. That would put some limits on the covariance values, not totally obvious ones when $n \gt 2$ – Henry Jan 11 '18 at 09:26
-
thank you for your reply @Henry. Please can you write what you said as a clear command in R? I will be happy. Thank you so much – 1190 Jan 11 '18 at 18:25
-
@B11b try the `matrixcalc` or `sfsmisc` or `corpcor` packages or some of the others mentioned in other answers to see if they have what you are looking for – Henry Jan 11 '18 at 19:19
You can simulate random positive definite matrices from the Wishart distribution using the function "rWishart" from stats (included in base)
n <- 4
rWishart(1,n,diag(n))

- 593
- 4
- 7
There is a package specifically for that, clusterGeneration
(written among other by Harry Joe, a big name in that field).
There are two main functions:
genPositiveDefMat
generate a covariance matrix, 4 different methodsrcorrmatrix
: generate a correlation matrix
Quick example:
library(clusterGeneration)
#> Loading required package: MASS
genPositiveDefMat("unifcorrmat",dim=3)
#> $egvalues
#> [1] 15.408962 5.673916 1.228842
#>
#> $Sigma
#> [,1] [,2] [,3]
#> [1,] 6.714871 1.643449 6.530493
#> [2,] 1.643449 6.568033 2.312455
#> [3,] 6.530493 2.312455 9.028815
genPositiveDefMat("eigen",dim=3)
#> $egvalues
#> [1] 8.409136 4.076442 2.256715
#>
#> $Sigma
#> [,1] [,2] [,3]
#> [1,] 2.3217300 -0.1467812 0.5220522
#> [2,] -0.1467812 4.1126757 0.5049819
#> [3,] 0.5220522 0.5049819 8.3078880
Created on 2019-10-27 by the reprex package (v0.3.0)
Finally, note that an alternative approach is to do a first try from scratch, then use Matrix::nearPD()
to make your matrix positive-definite.

- 2,699
- 14
- 25