14

How do I define the distribution of a random variable $Y$ such that a draw from $Y$ has correlation $\rho$ with $x_1$, where $x_1$ is a single draw from a distribution with cumulative distribution function $F_{X}(x)$?

Macro
  • 40,561
  • 8
  • 143
  • 148
OctaviaQ
  • 1,039
  • 7
  • 19
  • 1
    The following Qs are strongly related & will be of interest: [How to generate correlated random numbers (given means variances and degree of correlation)](http://stats.stackexchange.com/questions/38856/) & [Generate a random variable with a defined correlation to an existing variable](http://stats.stackexchange.com/questions/15011/). – gung - Reinstate Monica Oct 16 '12 at 16:08

1 Answers1

22

You can define it in terms of a data generating mechanism. For example, if $X \sim F_{X}$ and

$$ Y = \rho X + \sqrt{1 - \rho^{2}} Z $$

where $Z \sim F_{X}$ and is independent of $X$, then,

$$ {\rm cov}(X,Y) = {\rm cov}(X, \rho X) = \rho \cdot {\rm var}(X)$$

Also note that ${\rm var}(Y) = {\rm var}(X)$ since $Z$ has the same distribution as $X$. Therefore,

$$ {\rm cor}(X,Y) = \frac{ {\rm cov}(X,Y) }{ \sqrt{ {\rm var}(X)^{2} } } = \rho $$

So if you can generate data from $F_{X}$, you can generate a variate, $Y$, that has a specified correlation $(\rho)$ with $X$. Note, however, that the marginal distribution of $Y$ will only be $F_{X}$ in the special case where $F_{X}$ is the normal distribution (or some other additive distribution). This is due to the fact that sums of normally distributed variables are normal; that is not a general property of distributions. In the general case, you will have to calculate the distribution of $Y$ by calculating the (appropriately scaled) convolution of the density corresponding to $F_{X}$ with itself.

Macro
  • 40,561
  • 8
  • 143
  • 148
  • 2
    +1 Very nice answer. Nitpick: in the last line you need to convolve *scaled versions* of $F_X$. – whuber Jul 22 '11 at 16:53
  • Thanks so much, Macro. Just to clarify something -- you mean in your last paragraph that you would need to convolve the rho*X with the sqrt(1 - rho^2)*X? (sorry, I couldn't get any formatting, even HTML to work in this particular comment) – OctaviaQ Jul 22 '11 at 23:27
  • 1
    Convolve the densities corresponding to the distributions of $\rho X$ with the distribution of $\sqrt{1 - \rho^{2}} X$. This is a result of the general fact that the density of the sum of two continuous random variables is the convolution of the two densities. – Macro Jul 23 '11 at 01:51
  • 1
    A long time but...ideas of how to do this, also enforcing the marginal distribution of Y? – Julián Urbano Mar 22 '15 at 21:29