0

If a numerical vector $\vec{x}$ is a sample drawn from a normal distribution, given a correlation coefficient $\rho$ is there a way to simulate a second vector $\vec{y}$ such that the $corr(\vec{x},\vec{y})=\rho$ ?

My hunch is that $\vec{x}$ would be multiplied by a sample of equal length from a random uniform distribution centered around 1 and bound by some interval determined by the value of $\rho$, but I'm just guessing.

SubstantiaN
  • 165
  • 1
  • 3
  • 11
  • Do you have desired marginal distributions for $X$ and $Y$? – SecretAgentMan Sep 16 '18 at 01:41
  • X would be a sample draw from a normal distribution, sorry I should have specified. I've updated the question to reflect that. – SubstantiaN Sep 16 '18 at 01:52
  • Your notation isn't clear to me -- do you require to specify the *sample* correlation or the *population* correlation? (In either case, the question is already answered on site several times, but I need to know what to close it as a duplicate of, unless I locate one that answers both) – Glen_b Sep 16 '18 at 05:34
  • Ah, never mind, I found one that answers both. – Glen_b Sep 16 '18 at 05:40
  • @Glen_b Yes, you are correct, I was asking about sample correlation. Your answer on the other post addresses my question. What is the preferred practice here, should I delete this question? – SubstantiaN Sep 16 '18 at 15:05
  • No, left undeleted it serves a useful function -- people searching for an answer that use search terms that turn up your question but not that one will find an answer. – Glen_b Sep 16 '18 at 15:08
  • @Glen_b Is there a name for the method you applied in answering the other post and/or a reference you would recommend for citation? I would post this comment to your original answer, but I don't have sufficient rep points to post there. – SubstantiaN Sep 16 '18 at 15:23
  • do you mean the method used in part (1) there (the method for population correlation) or the slight tweak in (2) to get zero sample correlation, or the other slight tweak in (2) to get the sample standard deviation to be 1? ... I can't say I have a reference (though certainly many will exist); each of the steps are obvious enough once you know a little basic statistical theory. I think I may have seen the first method in an exercise once but I couldn't say for sure. The second I don't recall seeing before I first wrote it down but I've seen many people do it since so it was already well known, – Glen_b Sep 16 '18 at 21:41
  • ... consequently my guess is I probably at least heard of it rather than rediscovered it independently, but if so I couldn't say where. The algebraic notions behind the general version of the method in 1 just relies on a few well known properties (linear combinations of multivariate normals are normal, linearity of expectations, variance of $AX$ is $A\, \text{Var}(X)\, A'$, and then you just need a way of finding an $A$ such that $\Sigma=AA'$, which is where Choleski decomposition comes in, being one simple way to achieve that; I never remember how it works, I just rederive it when I need it). – Glen_b Sep 16 '18 at 21:52
  • I expect you'd find at least the method in (1) of that post in any decent book on simulation methods. The method in (2) is simply a tweak to make the sample have the desired properties rather than the population – Glen_b Sep 16 '18 at 21:57

1 Answers1

1

There is a relationship in bivariate normal distributions: $Y|X=x \sim \mathsf{Norm}(\rho x, \sqrt{1-\rho^2}).$

Implementing this in R, with $\rho = 0.8),$ we have the following code, where the number of variables y generated must be the length of x.

set.seed(2018)
x = rnorm(200, 50, 1)
y = rnorm(200, .8*x, sqrt(1-.8^2))
plot(x,y, pch=20)    
cor(x,y)
[1] 0.8016411

enter image description here

Of course $X$ and $Y$ are samples so you cannot expect the sample correlation $r$ to be exactly $\rho = 0.8.$ Large samples tend to have $r$ closer to $\rho$ than small ones.

BruceET
  • 47,896
  • 2
  • 28
  • 76
  • would you have the derivation of this bivariate relationship ? – Xavier Bourret Sicotte Sep 18 '18 at 17:44
  • Wanted to link to @Glen_b's [Answer](https://stats.stackexchange.com/questions/111865/tool-for-generating-correlated-data-sets), now noted above, which I had seen before; but couldn't find it at that time. **_Recommend you use that._** // Otherwise, note standard result: If $X,Y$ bivar norm with corr $\rho,$ $\mu_X=\mu_Y=0,$ and $\sigma_X = \sigma_Y = 1,$ then marginals are std norm and cond'ls are $X|Y=y \sim \mathsf{Norm}(\rho y, \sqrt{1-\rho^2})$ and $Y|X=x \sim \mathsf{Norm}(\rho x, \sqrt{1-\rho^2}).$ Look at math stat book for whole story. – BruceET Sep 18 '18 at 19:49