How do I create a Y that is a function of X such that correlation is 0?

Question

How do I create a $Y$ that is a function of $X$ such that correlation is $0$?

I used the function X = rnorm(100,0,1); What is a good $Y$ function such that the $cor(X,Y) = 0$ or very close to $0$?

Does this answer your question? [Generate two variables with precise pre-specified correlation](https://stats.stackexchange.com/questions/83172/generate-two-variables-with-precise-pre-specified-correlation) — Stephan Kolassa, Oct 01 '20 at 14:59

score 1 · Answer 1 · edited Oct 04 '20 at 10:01

1

You can use $y=x^2$. The symmetry of this function guarantees that the correlation will be zero. Keep in mind that the correlation might not be zero in any particular sample you generate, but it will be close to zero as long as your sample size is large enough.

edited Oct 04 '20 at 10:01

Nick Cox

48,377
8
110
156

answered Oct 01 '20 at 16:23

PedroSebe

2,526
6
14

1

(+1) for simplicity. Perhaps [see](https://math.stackexchange.com/questions/3848408/how-to-prove-that-the-random-variables-x-and-x2-are-not-independent) – BruceET Oct 02 '20 at 07:33
This working does depend also on the distribution of $x$. – Nick Cox Oct 04 '20 at 10:02
@NickCox Yes, definitely! The OP stated he sampled $X$ using `rnorm(100,0,1)`, that's why I proposed using $y=x^2$ – PedroSebe Oct 04 '20 at 13:50

score 1 · Answer 2 · answered Oct 01 '20 at 17:38

1

Generate $n$ samples of $X \sim \mathcal N(0,1)$ and $Y \sim \mathcal N(0,1)$.

Define $Y'$ as the residuals of the least squares linear model $Y=\beta_0+\beta_XX+\epsilon$. Residuals are guaranteed to be uncorrelated with the independent variable.

Or, in other words, $Y'=\epsilon= Y - \beta_0 -\beta_XX$.

Since $\epsilon \sim \mathcal N(0, \sigma^2)$, you might want to divide $Y' = \frac{\epsilon}{\sigma} \sim \mathcal N(0, 1)$

answered Oct 01 '20 at 17:38

Firebug

15,262
5
60
127

(+1) for taking the opportunity to reinforce something important. – BruceET Oct 02 '20 at 07:21
You need X and Y to be jointly normal for the error term to be normal. – Michael Oct 02 '20 at 07:51
@Michael Yes? I mean, I said $X \sim \mathcal N(0,1)$, $Y \sim \mathcal N(0,1)$ – Firebug Oct 02 '20 at 13:10
Marginals being normal does not mean they are jointly normal. – Michael Oct 02 '20 at 18:58
@Michael if they were generated independently (they were), they are $(Y,X) \sim MVN(0, \mathbb I)$ – Firebug Oct 02 '20 at 20:52
If they are independent, then $\beta_0 = \beta_X = 0$, i.e. $\epsilon = Y$. – Michael Oct 04 '20 at 07:19
@Michael Obviously, that would happen only at the limit $n \to \infty$. You need the procedure to ensure that the in-sample correlation is zero. And that's what the question is about. – Firebug Oct 05 '20 at 14:18

score 0 · Answer 3 · edited Oct 01 '20 at 17:24

0

Let $y$ be the fractional part of $x \cdot 1000000$. For any reasonably smooth $x$, this is nearly uncorrelated with $x$.

edited Oct 01 '20 at 17:24

lennon310

2,582
2
21
30

answered Oct 01 '20 at 15:43

chrishmorris

820
5
5

1

Can you expand to give a reference or expand on to why this works? – mdewey Oct 01 '20 at 17:08
2

I wonder what happens when $x$ has a Normal distribution with a standard deviation of $10^{-8},$ say (which nobody can validly argue is not "reasonably smooth," almost no matter what this phrase might be intended to mean). Too extremely small you think? My application is measuring atomic diameters and the units of measurement are meters. – whuber Oct 01 '20 at 19:37
1

A formulation of your example that is perhaps a bit less ad hoc would be to take x with uniform distribution on [0,1], and y given by the sequence starting at the n-digit in the dyadic expansion of x. Then x and y would approach being independent as n gets large. More precisely, the mixing coefficient approaches zero. This can be explained by either the ergodic or fractal/self-similar perspectives, to answer the question from @mdewey. – Michael Oct 02 '20 at 01:03

Michael · Accepted Answer · 2020-10-06T03:45:46.513

A very simple example of a random variable $x$ and a function $f$ such that $Cov(x, f(x)) = 0$ would be $x$ and $x^2$ for any $x$ with a symmetric distribution (and finite second moment), as suggested by @PedroSebe below. In such cases, $Cov(x, f(x)) = 0$ because $E[x|x^2] = 0$.

Alternatively, what follows is a less ad hoc formulation of the example suggested by @chrishmorris that gives $Cov(x, f(x))$ close to zero.

Any $x \in [0,1]$ admits a dyadic expansion $$ x = \sum_{k = 1}^{\infty} a_k \frac{1}{2^k}, \mbox{ where $a_k = 0$ or 1}. $$ Suppose $[0,1]$ has the uniform distribution. Given $x$ with dyadic representation $(a_{1}, a_{2}, \cdots)$, define $y$, as a function of $x$, to be the number in $[0,1]$ with dyadic representation $(a_{n}, a_{n+1}, \cdots)$, for some large $n$.

(The decimal expansion suggested by @chrishmorris is entirely analogous: write $x\in[0,1]$ as $$ x = \sum_{k = 1}^{\infty} a_k \frac{1}{10^k}, $$ where $a_k \in \{ 0,1, \cdots, 9\}$. The coin tosses/Bernoulli variables becomes tossing a 10-sided dice.)

It's equivalent to consider $\bar{x} = (a_k)_{k \geq 1}$ where $a_k$'s are i.i.d. Bernoulli variables and $\bar{y} = (a_k)_{k \geq n}$. (The map $x \mapsto \bar{x}$ is a one-to-one onto measure-preserving map from $[0,1]$ to the space of sequences of 0's and 1's.)

Informally, it is clear that the tail end of a sequence of coin tosses approaches being independent with the sequence, as you move further out on the tail. In particular, their correlation should approach zero.

More precisely, this can be seen by computing a mixing coefficient, e.g. the $\alpha$-mixing coefficient: $$ \alpha(x, y) = \sup_{A,B} |P(x \in A, y \in B) - P(x \in A) P(y \in B)|, $$ where the sup is taken over $x$-measurable set $A$ and $y$-measurable set $B$. In this case, it is easy to see that $\alpha(x, y) \leq \frac{1}{2^{n-1}} \rightarrow 0$. Therefore $$ Cov(x, y) \rightarrow 0, $$ by mixing inequality.

If you simulate (@chrishmorris's decimal case) $$ x \stackrel{d}{\sim}{U[0,1]}, \; y = \mbox{decimal part of $10^n x$}, $$ you will find that the sample covariance becomes small as $n$ gets larger.

How do I create a Y that is a function of X such that correlation is 0?

4 Answers4