5

I'm trying to generate correlated data (preferably multinormal) with predefined correlations (e.g. 0.35 or 0.9). Any idea how I can do it? I'm using R and I did find a way to generate this (using mvrnorm), but you need to supply a covariance matrix. I have a covariance matrix with correlations around 0.9; however, I don't know how I can modify its entries to change the correlation. If I can do that, I'll be able to generate correlated data with the correlations I need.

Regards,

Jawad
  • 131
  • 1
  • 2
  • 5
  • You just need to play with the values in the covariance matrix in [mvrnorm](http://stat.ethz.ch/R-manual/R-devel/library/MASS/html/mvrnorm.html) and relate them with the definition of correlation matrix. –  Apr 23 '12 at 12:00
  • If you post the code for you covariance matrix we can tell you how to modify it to get other correlations. – MånsT Apr 23 '12 at 12:11
  • Procrastinator, I can't just change the values in the matrix to whatever I want, changing any number has an effect on other entries in the matrix and I must know how the other entries change (inc. or dec.) before changing anything. For example, changing the variance of any variable will change its covaraince with the other variables. – Jawad Apr 23 '12 at 12:30
  • MansT, there is no code for the covariance matrix, I have a table of correlated data which I read into R and pass it to mvrnorm to calculate the means and the covariance matrix in order to generate more correlated data based on the original one. I can post the matrix, but I'm not sure how that affects the method that I should use to create a covariance matrix with pre-determined correlations. There should be a way to do this regardless of what I already have. – Jawad Apr 23 '12 at 12:34
  • So is your question how you can obtain a correlation matrix given a covariance matrix? – MånsT Apr 23 '12 at 13:02
  • No, the question is still the same: how to obtain correlated data with pre-defined correlations? The fact that I have a covariance matrix at hand has nothing to do with it. If I can find a way to modify this matrix correctly, I can use the mvrnorm function in R to obtain the correlated data. – Jawad Apr 23 '12 at 13:18
  • 2
    The correlation between $X_i$ and $X_j$ is given $$ Cor(X_i,X_j) = \frac{Cov(X_i, X_j)}{sd(X_i)sd(X_j)}. $$ If your correlation matrix is V this is $$ Cor(X_i,X_j) = \frac{V_{ij}}{\sqrt{V_{ii}}\sqrt{V_{jj}}}. $$ Maybe this can help you set up your covariance matrix, especially if you are able to simplify your problem by standardizing each variable. – Erik Apr 23 '12 at 13:25
  • @MånsT is right. You want correlated data; that means you want data with a specified *correlation* matrix. The function for generating those data requires you to input a *covariance* matrix. Thus, what you need to know is how to get the covariance matrix that corresponds to the correlation matrix you're interested in. To do this, you use Erik's formulas. Start w/ what SD's you want for each variable; square them to get the variances; given that you know the correlation you want & now you have the variances, elementary algebra lets you solve for the covariances & you're done. – gung - Reinstate Monica Apr 23 '12 at 13:42
  • Sorry, the way I phrased part of that last bit might be misleading. W/ @Erik's formulas you solve for the covariances using the correlations you want & the SD's you want--you only use the variances to plug in the diagonal elements of the covariance matrix. – gung - Reinstate Monica Apr 23 '12 at 13:51
  • Thanks Erik and Gung, I'm already aware of the correlation formula, I thought there is another way to do this without working backwards from the formula. – Jawad Apr 23 '12 at 14:01
  • 2
    This question has been discussed on here before. For example, look here: http://stats.stackexchange.com/questions/13382/how-to-define-a-distribution-that-correlates-with-a-draw-from-another-distributi/13384#13384 – Macro Apr 23 '12 at 14:52
  • This has been answered in https://stackoverflow.com/a/44930649/1297830. The trick is to use `MASS::mvrnorm(..., empirical=TRUE)` – Jonas Lindeløv Aug 30 '18 at 08:08

2 Answers2

3

The MASS package has a function called mvrnorm() that can generate a group or random numbers to a specified level of correlation. An example of the setup can be found in the beginning of the example here: http://menugget.blogspot.de/2011/11/propagation-of-error.html

Marc in the box
  • 3,532
  • 3
  • 33
  • 47
  • Sorry, didn't see that Jawad had already pointed you to the same function. In any case, the example posted might help you understand how to set it up. – Marc in the box Apr 23 '12 at 12:47
  • Thanks Marc, from the page I understand that all I have to do is set the diagonal elements of my covariance matrix to rho and the off-diagonal elements to 1 and I should get the data I need correlated by rho? – Jawad Apr 23 '12 at 13:16
  • Not exactly - the covariance matrix will depend on your defined standard deviations. If sd=1 for all series, then you are correct. Otherwise, you will need to define your std. devs for each series. – Marc in the box Apr 23 '12 at 13:26
  • No. The _variances_ of the variables should be along the _diagonal_ and the off-diagonal elements should be rho (if $\sigma^2=1$). – MånsT Apr 23 '12 at 13:38
3

Actually this is a trap question: it sounds easy but it is not (+1). The short answer to your question is you can't.

I will give an example. Imagine you have 3 Gaussian variables $X_1, X_2$ and $X_3$. You want the correlation between $X_1$ and $X_2$ to be 0, and all correlations with $X_3$ to be 1. This is obviously impossible because $X_1 = X_3$ and $X_3 = X_1$ says that $X_1 = X_2$ (up to shifting and scaling), which contrasts with the assumption that they are independent!

You would have the same situation if you replace 0 by "close to 0" and 1 by "close to 1" in the previous example. The issue here is that not every matrix is a correlation matrix. The requirement for being a correlation matrix is to be symmetric and positive definite.

You cannot choose arbitrary correlation values, but you can check whether they define a valid correlation matrix. Say that you have a symmetric square matrix mat with required correlation coefficients. You can test that it is positive definite as shown below.

all(eigen(mat)$values >= 0)

For symmetric real matrices, positive definite is equivalent to having all eigenvalues positive.

gui11aume
  • 13,383
  • 2
  • 44
  • 89
  • It might be good to make the inequality in the code nonstrict to allow for perfect correlations between linear combinations of variables. – cardinal Jun 03 '12 at 14:41
  • @cardinal Done. But that is purely for demonstration purposes. Testing strict equality of real numbers is something R cannot do as (.3-.2) == (.2-.1) shows. – gui11aume Jun 03 '12 at 14:47
  • 1
    Good point; it was actually the larger conceptual point I was trying to address. That "limitation" has more to do with floating point representation, than R itself, though. Testing against zero *is* a bit special. Some related routines in R will truncate small values to zero if they fall below a tolerance. – cardinal Jun 03 '12 at 14:54
  • 1
    @cardinal 'That "limitation" has more to do with floating point representation, than R itself, though' Yes of course. Apologies to the R team :-) – gui11aume Jun 03 '12 at 15:01