3

Can anyone suggest a method for generating random correlation matrix with $90\%$ of the off-diagonal entries between $[-0.3, 0.3]$. The other $10\%$ should be larger than $0.3$ or smaller than $-0.3$.

amoeba
  • 93,463
  • 28
  • 275
  • 317
Richard
  • 157
  • 5
  • You have to be aware that you can't get *arbitrary* negative correlations between variables, for one thing. – cardinal May 07 '11 at 01:23
  • Can you define what you mean by "random"? This seems related to your [previous question](http://stats.stackexchange.com/questions/10121/generate-normally-distributed-correlation-matrix). – cardinal May 07 '11 at 01:29
  • 1
    Is the 90/10 requirement "hard"? In lower dimensions you might be able to get close by drawing from a Wishart centered at $I$, computing the correlation matrix, and rejecting samples that aren't within some tolerance. Though I suspect this won't scale well at all... – JMS May 07 '11 at 01:52
  • @JMS We're starting to get some clarification in new comments to the preceding question linked to by @Cardinal: you might want to check there. – whuber May 07 '11 at 02:19
  • @whuber Good, thanks. Might we close this then? It seems "duplicate in intention" as it were and doesn't contain much beyond my foolishness. – JMS May 07 '11 at 03:25
  • @cardinal, sorry for being ignorance. I missed the part of the thread yesterday and did not answer your question. Regard "can't get abitrary negativ correlations", I hope my clarification of previous question answers your question. – Richard May 07 '11 at 09:56

3 Answers3

1

Here's a heuristic that I coded up quickly that seems to do quite well:

  1. Initialize a matrix with 1 on the diagonals.
  2. Fill out the upper triangular sub-matrix according to your distribution (90% are uniform on (-.3,.3) and 10% outside that).
  3. Make the matrix symmetric.
  4. Now iterate between
    • Project the matrix onto the PSD cone.
    • Project the matrix onto the set of matrices with diagonal 1.
  5. Alternating projections converges, so we just hope that the matrix we get out has values according to your distribution (see simulation for the check).
   pickone <- function(x){
  if(runif(1)<.9){
    return(runif(1,-.3,.3))
  } else {
    return(sample(c(-1,1),1)*runif(1,.3,1))
  }
}

generateMat <- function(x){
  X <- matrix(0,nrow=10,ncol=10)
  diag(X) <- rep(1,10)
  X[upper.tri(X)] <- sapply(1:45,pickone)
  X <- X + t(X)-diag(rep(1,10))
  Xnew <- X

  for(i in 1:50){
    eig <- eigen(Xnew)
    ##project onto the PSD cone
    Xnew <- eig$vectors%*%diag(sapply(eig$values,max,0))%*%t(eig$vectors)
    ##project onto the set of matrices with diagonal 1
    diag(Xnew) <- rep(1,10)
  }

  vals <- Xnew[upper.tri(Xnew)]
  return(mean(vals < .3 & vals > -.3))
}

summary(sapply(1:100,generateMat))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.7556  0.8667  0.8889  0.8960  0.9333  0.9778

It seems like most of the values after simulating 100 times are close to 90% within (-.3,.3).

ncray
  • 416
  • 2
  • 4
  • 2
    I don't see the point of doing these alternating projects. It seems like a waste. Let $\newcommand{\Xm}{\mathbf{X}}\newcommand{\Dm}{\mathbf{D}}\Xm$ be your initial matrix and $\widetilde{\Xm}$ the first projection onto the PSD cone. Let $\Dm = \mathrm{diag}(\widetilde{\Xm})$, i.e., the diagonal matrix of the diagonal entries of $\widetilde{\Xm}$. Then $\hat{\Xm} = \Dm^{-1/2} \widetilde{\Xm} \Dm^{-1/2}$ is a positive semidefinite correlation matrix. No iteration needed. – cardinal May 12 '11 at 23:14
  • 1
    Also, a note on `R`. You can use `pmax(eig$values,0)` instead of the more cumbersome `sapply` call. – cardinal May 12 '11 at 23:16
  • @cardinal Right on both counts. Thanks for the suggestions. – ncray May 13 '11 at 00:17
  • (+1) For the basic idea. But, I don't think the distribution is going to be at all uniform on $[-3,3]$, especially for larger dimensions. – cardinal May 13 '11 at 01:14
0

If all you care about is the proportion of entries between $\pm 0.3$ then sure - generate a random correlation matrix, compute the proportion of entries which are greater than $0.3$ in absolute value, and if there are too many pick some at random and reassign them to random values between $\pm 0.3$. Similarly if there are too few.

Edit: Never you mind, this won't work; see the comments...

JMS
  • 4,660
  • 1
  • 22
  • 32
  • Thanks for your answer. I have a question. First how to generate a random correlation matrix. Secondly, after reassign some numbers, the new matrix might not be a correlation matrix any more. How do I make sure it is still a correlation matrix? – Richard May 07 '11 at 00:50
  • It is still positive (semi)definite if you do the reassignment in the way I describe, unless I'm terribly mistaken. There are all sorts of ways that you might generate a correlation matrix; see some of the links in the sidebar. A simple way is to draw a random matrix $A$ which is $n\times p$ with $n

    – JMS May 07 '11 at 01:02
  • 1
    I am very interested in seeing this. Say I generate a random corr matrix that happens to have 20% of them between -0.3 and 0.3. Then I need to resample 70% of the entries unfiormly between -0.3 to 0.3. How do we see the new matrix is still psd? – Richard May 07 '11 at 01:20
  • 2
    @JMS, Why would the matrix still be positive semidefinite necessarily? I don't believe your statement is true. – cardinal May 07 '11 at 01:21
  • @cardinal, thanks for your comment. We had some discussion about the normally distributed random corr matrix. I don't think i can get the number large enough. So I want to give up normal and see if there is a general way to generate such correlation matrix . – Richard May 07 '11 at 01:29
  • @JMS This matrix has a negative eigenvalue:$$\left( \begin{array}{cccccc} 1 & -\frac{1}{5} & 0 & \frac{1}{5} & \frac{3}{10} & \frac{3}{10} \\ -\frac{1}{5} & 1 & \frac{1}{10} & \frac{3}{10} & \frac{3}{10} & \frac{1}{5} \\ 0 & \frac{1}{10} & 1 & \frac{3}{10} & -\frac{1}{5} & -\frac{1}{5} \\ \frac{1}{5} & \frac{3}{10} & \frac{3}{10} & 1 & -\frac{3}{10} & -\frac{3}{10} \\ \frac{3}{10} & \frac{3}{10} & -\frac{1}{5} & -\frac{3}{10} & 1 & -\frac{1}{10} \\ \frac{3}{10} & \frac{1}{5} & -\frac{1}{5} & -\frac{3}{10} & -\frac{1}{10} & 1 \end{array} \right)$$ – whuber May 07 '11 at 01:38
  • ...and I am terribly mistaken :) Not sure where my head was at. – JMS May 07 '11 at 01:39
  • Thanks for all the comments. I try to generate the corr matrix I described. It is not hard to get a small matrix by rejecting the ones that are not psd. But for biger matrix, it's become very difficult to get the psd with the property I want. – Richard May 07 '11 at 01:45
  • If we use t(A)*A to construct a correlation matrix and construct the row of A from mulitvariate normal with different correlations, can we control the size of the entries in the resulting correlation matrix? – Richard May 07 '11 at 01:53
  • Just *checking* the positive-definiteness of a big matrix after you've fiddled with individual entries is going to be computationally intensive. My comments in your [other question](http://stats.stackexchange.com/questions/10121/generate-normally-distributed-correlation-matrix) give a couple indications as to why your construction may be difficult to obtain. – cardinal May 07 '11 at 01:54
  • @Richard This is a Wishart distribution; see the comment I just added, and http://en.wikipedia.org/wiki/Wishart_distribution – JMS May 07 '11 at 01:55
  • @JMS, thanks. That's very interesting. I want to look into that. – Richard May 07 '11 at 02:12
  • @cardinal/whuber I explain further of the other question. Do you think you understand that question better? – Richard May 07 '11 at 02:14
0

Here's an older answer to a similar question on SO. It has some code that you could try/modify:

Similar Question

Some other links:

Forecasting Covariance Matrices

Various Matrix Techniques

Matrix Shrinkage Technique

bill_080
  • 3,458
  • 1
  • 20
  • 21