Can anyone suggest a method for generating random correlation matrix with $90\%$ of the off-diagonal entries between $[-0.3, 0.3]$. The other $10\%$ should be larger than $0.3$ or smaller than $-0.3$.
Asked
Active
Viewed 679 times
3
-
You have to be aware that you can't get *arbitrary* negative correlations between variables, for one thing. – cardinal May 07 '11 at 01:23
-
Can you define what you mean by "random"? This seems related to your [previous question](http://stats.stackexchange.com/questions/10121/generate-normally-distributed-correlation-matrix). – cardinal May 07 '11 at 01:29
-
1Is the 90/10 requirement "hard"? In lower dimensions you might be able to get close by drawing from a Wishart centered at $I$, computing the correlation matrix, and rejecting samples that aren't within some tolerance. Though I suspect this won't scale well at all... – JMS May 07 '11 at 01:52
-
@JMS We're starting to get some clarification in new comments to the preceding question linked to by @Cardinal: you might want to check there. – whuber May 07 '11 at 02:19
-
@whuber Good, thanks. Might we close this then? It seems "duplicate in intention" as it were and doesn't contain much beyond my foolishness. – JMS May 07 '11 at 03:25
-
@cardinal, sorry for being ignorance. I missed the part of the thread yesterday and did not answer your question. Regard "can't get abitrary negativ correlations", I hope my clarification of previous question answers your question. – Richard May 07 '11 at 09:56
3 Answers
1
Here's a heuristic that I coded up quickly that seems to do quite well:
- Initialize a matrix with 1 on the diagonals.
- Fill out the upper triangular sub-matrix according to your distribution (90% are uniform on (-.3,.3) and 10% outside that).
- Make the matrix symmetric.
- Now iterate between
- Project the matrix onto the PSD cone.
- Project the matrix onto the set of matrices with diagonal 1.
- Alternating projections converges, so we just hope that the matrix we get out has values according to your distribution (see simulation for the check).
pickone <- function(x){ if(runif(1)<.9){ return(runif(1,-.3,.3)) } else { return(sample(c(-1,1),1)*runif(1,.3,1)) } } generateMat <- function(x){ X <- matrix(0,nrow=10,ncol=10) diag(X) <- rep(1,10) X[upper.tri(X)] <- sapply(1:45,pickone) X <- X + t(X)-diag(rep(1,10)) Xnew <- X for(i in 1:50){ eig <- eigen(Xnew) ##project onto the PSD cone Xnew <- eig$vectors%*%diag(sapply(eig$values,max,0))%*%t(eig$vectors) ##project onto the set of matrices with diagonal 1 diag(Xnew) <- rep(1,10) } vals <- Xnew[upper.tri(Xnew)] return(mean(vals < .3 & vals > -.3)) } summary(sapply(1:100,generateMat)) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.7556 0.8667 0.8889 0.8960 0.9333 0.9778
It seems like most of the values after simulating 100 times are close to 90% within (-.3,.3).

ncray
- 416
- 2
- 4
-
2I don't see the point of doing these alternating projects. It seems like a waste. Let $\newcommand{\Xm}{\mathbf{X}}\newcommand{\Dm}{\mathbf{D}}\Xm$ be your initial matrix and $\widetilde{\Xm}$ the first projection onto the PSD cone. Let $\Dm = \mathrm{diag}(\widetilde{\Xm})$, i.e., the diagonal matrix of the diagonal entries of $\widetilde{\Xm}$. Then $\hat{\Xm} = \Dm^{-1/2} \widetilde{\Xm} \Dm^{-1/2}$ is a positive semidefinite correlation matrix. No iteration needed. – cardinal May 12 '11 at 23:14
-
1Also, a note on `R`. You can use `pmax(eig$values,0)` instead of the more cumbersome `sapply` call. – cardinal May 12 '11 at 23:16
-
-
(+1) For the basic idea. But, I don't think the distribution is going to be at all uniform on $[-3,3]$, especially for larger dimensions. – cardinal May 13 '11 at 01:14
0
If all you care about is the proportion of entries between $\pm 0.3$ then sure - generate a random correlation matrix, compute the proportion of entries which are greater than $0.3$ in absolute value, and if there are too many pick some at random and reassign them to random values between $\pm 0.3$. Similarly if there are too few.
Edit: Never you mind, this won't work; see the comments...

JMS
- 4,660
- 1
- 22
- 32
-
Thanks for your answer. I have a question. First how to generate a random correlation matrix. Secondly, after reassign some numbers, the new matrix might not be a correlation matrix any more. How do I make sure it is still a correlation matrix? – Richard May 07 '11 at 00:50
-
It is still positive (semi)definite if you do the reassignment in the way I describe, unless I'm terribly mistaken. There are all sorts of ways that you might generate a correlation matrix; see some of the links in the sidebar. A simple way is to draw a random matrix $A$ which is $n\times p$ with $n – JMS May 07 '11 at 01:02
-
1I am very interested in seeing this. Say I generate a random corr matrix that happens to have 20% of them between -0.3 and 0.3. Then I need to resample 70% of the entries unfiormly between -0.3 to 0.3. How do we see the new matrix is still psd? – Richard May 07 '11 at 01:20
-
2@JMS, Why would the matrix still be positive semidefinite necessarily? I don't believe your statement is true. – cardinal May 07 '11 at 01:21
-
@cardinal, thanks for your comment. We had some discussion about the normally distributed random corr matrix. I don't think i can get the number large enough. So I want to give up normal and see if there is a general way to generate such correlation matrix . – Richard May 07 '11 at 01:29
-
@JMS This matrix has a negative eigenvalue:$$\left( \begin{array}{cccccc} 1 & -\frac{1}{5} & 0 & \frac{1}{5} & \frac{3}{10} & \frac{3}{10} \\ -\frac{1}{5} & 1 & \frac{1}{10} & \frac{3}{10} & \frac{3}{10} & \frac{1}{5} \\ 0 & \frac{1}{10} & 1 & \frac{3}{10} & -\frac{1}{5} & -\frac{1}{5} \\ \frac{1}{5} & \frac{3}{10} & \frac{3}{10} & 1 & -\frac{3}{10} & -\frac{3}{10} \\ \frac{3}{10} & \frac{3}{10} & -\frac{1}{5} & -\frac{3}{10} & 1 & -\frac{1}{10} \\ \frac{3}{10} & \frac{1}{5} & -\frac{1}{5} & -\frac{3}{10} & -\frac{1}{10} & 1 \end{array} \right)$$ – whuber May 07 '11 at 01:38
-
-
Thanks for all the comments. I try to generate the corr matrix I described. It is not hard to get a small matrix by rejecting the ones that are not psd. But for biger matrix, it's become very difficult to get the psd with the property I want. – Richard May 07 '11 at 01:45
-
If we use t(A)*A to construct a correlation matrix and construct the row of A from mulitvariate normal with different correlations, can we control the size of the entries in the resulting correlation matrix? – Richard May 07 '11 at 01:53
-
Just *checking* the positive-definiteness of a big matrix after you've fiddled with individual entries is going to be computationally intensive. My comments in your [other question](http://stats.stackexchange.com/questions/10121/generate-normally-distributed-correlation-matrix) give a couple indications as to why your construction may be difficult to obtain. – cardinal May 07 '11 at 01:54
-
@Richard This is a Wishart distribution; see the comment I just added, and http://en.wikipedia.org/wiki/Wishart_distribution – JMS May 07 '11 at 01:55
-
-
@cardinal/whuber I explain further of the other question. Do you think you understand that question better? – Richard May 07 '11 at 02:14
0
Here's an older answer to a similar question on SO. It has some code that you could try/modify:
Some other links: