1

So, I want to simulate data based on given means, standard deviation and correlation. What i have done thus far is random simulation of the three variables based on normal distribution, means and standard deviation. For the correlation I'm trying to use Eigenvectors / Eigenvalues, as i have understood (maybe wrong?) that this is an alternative to the Cholesky Method. However, when i simulate the data (50 000 + datapoints per series) two of the three correlations are within acceptable range (ie. 0.175 vs. 0.16 and 0.22 vs 0.24) but the last one is way off (0.44 vs 0.58). I have run the simulation many times and the values fluctuates around these numbers. What i have done:

Means and Std.Dev

|         |      A |     B |     C |
|---------|-------:|------:|------:|
| Mean    |  0.045 |  0.07 | 0.095 |
| Std.Dev | 0.0155 | 0.128 | 0.131 |

Correlation Matrix:

|   |    A |    B |    C |
|---|-----:|-----:|-----:|
| A |    1 | 0.16 | 0.24 |
| B | 0.16 |    1 | 0.58 |
| C | 0.24 | 0.58 |    1 |

Eigenvectors / Eigenvalues

|     E1 |     E2 |    E3 |
|-------:|-------:|------:|
|  0.105 |  0.920 | 0.378 |
|  0.685 | -0.343 | 0.643 |
| -0.721 | -0.192 | 0.666 |

|    λ1 |     λ2 |    λ3 |
|------:|-------:|------:|
| 0.418 | 0.8904 | 0.378 |

$$V = E_{i}*Diag(\sqrt{\lambda _{i}})$$

And then used $$R_{C} = R*V^{T}$$ Where $R_{C}$ is the correlated random numbers and $R$ is the uncorrelated random numbers. I can't seem to get it right. Anyone?

Edit: I have also tried using the Cholesky Method but it gives the same results but slightly worse. I was also under the impression that the Eigensystem is better as the Cholesky Method needs a PD correlation Matrix. Since I want to expand later on that may not be the case anymore (PD that is).

Ken Peters
  • 13
  • 3
  • Welcome to our site! – kjetil b halvorsen Oct 15 '15 at 10:19
  • 4
    Possible duplicate of [How to generate correlated random numbers (given means, variances and degree of correlation)?](http://stats.stackexchange.com/questions/38856/how-to-generate-correlated-random-numbers-given-means-variances-and-degree-of) – Tim Oct 15 '15 at 10:20
  • @Tim I get the same results With the Cholesky Method (a little bit worse) and I was also under the impression that the Eigensystem was better than the Cholesky Method for a number of reasons. If i were to expand and the correlation Matrix wasn't PD anymore it wouldn't work. – Ken Peters Oct 15 '15 at 10:36

1 Answers1

1

I do not see (a) how you compute the eigenvalues and eigenvectors of your covariance matrix and (b) why you use $E\cdot \text{diag}(\sqrt{\lambda})$ instead of $E\cdot \text{diag}(\sqrt{\lambda})\cdot E^\text{T}$ in the transform of the iid normal variates.

Here is an illustration in R

> cov=matrix(c(1,.16,.24,.16,1,.58,.24,.58,1),ncol=3)
> cov
     [,1] [,2] [,3]
[1,] 1.00 0.16 0.24
[2,] 0.16 1.00 0.58
[3,] 0.24 0.58 1.00
> eigen(cov)
$values
[1] 1.6954499 0.8907326 0.4138174

$vectors
           [,1]       [,2]       [,3]
[1,] -0.3778550  0.9194725 -0.1086091
[2,] -0.6427279 -0.3449282 -0.6840507
[3,] -0.6664281 -0.1886659  0.7213035

which returns very different eigenvalues. And which leads to properly correlated data:

> tors=eigen(cov)$vec
> vlu=eigen(cov)$val
> (tors)%*%diag(vlu)%*%t(tors)
     [,1] [,2] [,3]
[1,] 1.00 0.16 0.24
[2,] 0.16 1.00 0.58
[3,] 0.24 0.58 1.00

and correctly simulated variates:

> monc=(tors)%*%diag(sqrt(vlu))%*%t(tors)%*%matrix(rnorm(3*1e5),nrow=3)
> monc%*%t(monc)/1e5
          [,1]      [,2]      [,3]
[1,] 1.0094624 0.1661131 0.2449860
[2,] 0.1661131 0.9999645 0.5807681
[3,] 0.2449860 0.5807681 1.0010454
Xi'an
  • 90,397
  • 9
  • 157
  • 575