1

How do I create a $Y$ that is a function of $X$ such that correlation is $0$?

I used the function X = rnorm(100,0,1); What is a good $Y$ function such that the $cor(X,Y) = 0$ or very close to $0$?

Ale
  • 1,570
  • 2
  • 10
  • 19
  • 7
    Does this answer your question? [Generate two variables with precise pre-specified correlation](https://stats.stackexchange.com/questions/83172/generate-two-variables-with-precise-pre-specified-correlation) – Stephan Kolassa Oct 01 '20 at 14:59

4 Answers4

1

You can use $y=x^2$. The symmetry of this function guarantees that the correlation will be zero. Keep in mind that the correlation might not be zero in any particular sample you generate, but it will be close to zero as long as your sample size is large enough.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
PedroSebe
  • 2,526
  • 6
  • 14
  • 1
    (+1) for simplicity. Perhaps [see](https://math.stackexchange.com/questions/3848408/how-to-prove-that-the-random-variables-x-and-x2-are-not-independent) – BruceET Oct 02 '20 at 07:33
  • This working does depend also on the distribution of $x$. – Nick Cox Oct 04 '20 at 10:02
  • @NickCox Yes, definitely! The OP stated he sampled $X$ using `rnorm(100,0,1)`, that's why I proposed using $y=x^2$ – PedroSebe Oct 04 '20 at 13:50
1

Generate $n$ samples of $X \sim \mathcal N(0,1)$ and $Y \sim \mathcal N(0,1)$.

Define $Y'$ as the residuals of the least squares linear model $Y=\beta_0+\beta_XX+\epsilon$. Residuals are guaranteed to be uncorrelated with the independent variable.

Or, in other words, $Y'=\epsilon= Y - \beta_0 -\beta_XX$.

Since $\epsilon \sim \mathcal N(0, \sigma^2)$, you might want to divide $Y' = \frac{\epsilon}{\sigma} \sim \mathcal N(0, 1)$

Firebug
  • 15,262
  • 5
  • 60
  • 127
0

Let $y$ be the fractional part of $x \cdot 1000000$. For any reasonably smooth $x$, this is nearly uncorrelated with $x$.

lennon310
  • 2,582
  • 2
  • 21
  • 30
chrishmorris
  • 820
  • 5
  • 5
  • 1
    Can you expand to give a reference or expand on to why this works? – mdewey Oct 01 '20 at 17:08
  • 2
    I wonder what happens when $x$ has a Normal distribution with a standard deviation of $10^{-8},$ say (which nobody can validly argue is not "reasonably smooth," almost no matter what this phrase might be intended to mean). Too extremely small you think? My application is measuring atomic diameters and the units of measurement are meters. – whuber Oct 01 '20 at 19:37
  • 1
    A formulation of your example that is perhaps a bit less ad hoc would be to take x with uniform distribution on [0,1], and y given by the sequence starting at the n-digit in the dyadic expansion of x. Then x and y would approach being independent as n gets large. More precisely, the mixing coefficient approaches zero. This can be explained by either the ergodic or fractal/self-similar perspectives, to answer the question from @mdewey. – Michael Oct 02 '20 at 01:03
0

A very simple example of a random variable $x$ and a function $f$ such that $Cov(x, f(x)) = 0$ would be $x$ and $x^2$ for any $x$ with a symmetric distribution (and finite second moment), as suggested by @PedroSebe below. In such cases, $Cov(x, f(x)) = 0$ because $E[x|x^2] = 0$.

Alternatively, what follows is a less ad hoc formulation of the example suggested by @chrishmorris that gives $Cov(x, f(x))$ close to zero.

Any $x \in [0,1]$ admits a dyadic expansion $$ x = \sum_{k = 1}^{\infty} a_k \frac{1}{2^k}, \mbox{ where $a_k = 0$ or 1}. $$ Suppose $[0,1]$ has the uniform distribution. Given $x$ with dyadic representation $(a_{1}, a_{2}, \cdots)$, define $y$, as a function of $x$, to be the number in $[0,1]$ with dyadic representation $(a_{n}, a_{n+1}, \cdots)$, for some large $n$.

(The decimal expansion suggested by @chrishmorris is entirely analogous: write $x\in[0,1]$ as $$ x = \sum_{k = 1}^{\infty} a_k \frac{1}{10^k}, $$ where $a_k \in \{ 0,1, \cdots, 9\}$. The coin tosses/Bernoulli variables becomes tossing a 10-sided dice.)

It's equivalent to consider $\bar{x} = (a_k)_{k \geq 1}$ where $a_k$'s are i.i.d. Bernoulli variables and $\bar{y} = (a_k)_{k \geq n}$. (The map $x \mapsto \bar{x}$ is a one-to-one onto measure-preserving map from $[0,1]$ to the space of sequences of 0's and 1's.)

Informally, it is clear that the tail end of a sequence of coin tosses approaches being independent with the sequence, as you move further out on the tail. In particular, their correlation should approach zero.

More precisely, this can be seen by computing a mixing coefficient, e.g. the $\alpha$-mixing coefficient: $$ \alpha(x, y) = \sup_{A,B} |P(x \in A, y \in B) - P(x \in A) P(y \in B)|, $$ where the sup is taken over $x$-measurable set $A$ and $y$-measurable set $B$. In this case, it is easy to see that $\alpha(x, y) \leq \frac{1}{2^{n-1}} \rightarrow 0$. Therefore $$ Cov(x, y) \rightarrow 0, $$ by mixing inequality.

If you simulate (@chrishmorris's decimal case) $$ x \stackrel{d}{\sim}{U[0,1]}, \; y = \mbox{decimal part of $10^n x$}, $$ you will find that the sample covariance becomes small as $n$ gets larger.

Michael
  • 2,853
  • 10
  • 15