How do I create a $Y$ that is a function of $X$ such that correlation is $0$?
I used the function X = rnorm(100,0,1)
; What is a good $Y$ function such that the $cor(X,Y) = 0$ or very close to $0$?
How do I create a $Y$ that is a function of $X$ such that correlation is $0$?
I used the function X = rnorm(100,0,1)
; What is a good $Y$ function such that the $cor(X,Y) = 0$ or very close to $0$?
You can use $y=x^2$. The symmetry of this function guarantees that the correlation will be zero. Keep in mind that the correlation might not be zero in any particular sample you generate, but it will be close to zero as long as your sample size is large enough.
Generate $n$ samples of $X \sim \mathcal N(0,1)$ and $Y \sim \mathcal N(0,1)$.
Define $Y'$ as the residuals of the least squares linear model $Y=\beta_0+\beta_XX+\epsilon$. Residuals are guaranteed to be uncorrelated with the independent variable.
Or, in other words, $Y'=\epsilon= Y - \beta_0 -\beta_XX$.
Since $\epsilon \sim \mathcal N(0, \sigma^2)$, you might want to divide $Y' = \frac{\epsilon}{\sigma} \sim \mathcal N(0, 1)$
Let $y$ be the fractional part of $x \cdot 1000000$. For any reasonably smooth $x$, this is nearly uncorrelated with $x$.
A very simple example of a random variable $x$ and a function $f$ such that $Cov(x, f(x)) = 0$ would be $x$ and $x^2$ for any $x$ with a symmetric distribution (and finite second moment), as suggested by @PedroSebe below. In such cases, $Cov(x, f(x)) = 0$ because $E[x|x^2] = 0$.
Alternatively, what follows is a less ad hoc formulation of the example suggested by @chrishmorris that gives $Cov(x, f(x))$ close to zero.
Any $x \in [0,1]$ admits a dyadic expansion $$ x = \sum_{k = 1}^{\infty} a_k \frac{1}{2^k}, \mbox{ where $a_k = 0$ or 1}. $$ Suppose $[0,1]$ has the uniform distribution. Given $x$ with dyadic representation $(a_{1}, a_{2}, \cdots)$, define $y$, as a function of $x$, to be the number in $[0,1]$ with dyadic representation $(a_{n}, a_{n+1}, \cdots)$, for some large $n$.
(The decimal expansion suggested by @chrishmorris is entirely analogous: write $x\in[0,1]$ as $$ x = \sum_{k = 1}^{\infty} a_k \frac{1}{10^k}, $$ where $a_k \in \{ 0,1, \cdots, 9\}$. The coin tosses/Bernoulli variables becomes tossing a 10-sided dice.)
It's equivalent to consider $\bar{x} = (a_k)_{k \geq 1}$ where $a_k$'s are i.i.d. Bernoulli variables and $\bar{y} = (a_k)_{k \geq n}$. (The map $x \mapsto \bar{x}$ is a one-to-one onto measure-preserving map from $[0,1]$ to the space of sequences of 0's and 1's.)
Informally, it is clear that the tail end of a sequence of coin tosses approaches being independent with the sequence, as you move further out on the tail. In particular, their correlation should approach zero.
More precisely, this can be seen by computing a mixing coefficient, e.g. the $\alpha$-mixing coefficient: $$ \alpha(x, y) = \sup_{A,B} |P(x \in A, y \in B) - P(x \in A) P(y \in B)|, $$ where the sup is taken over $x$-measurable set $A$ and $y$-measurable set $B$. In this case, it is easy to see that $\alpha(x, y) \leq \frac{1}{2^{n-1}} \rightarrow 0$. Therefore $$ Cov(x, y) \rightarrow 0, $$ by mixing inequality.
If you simulate (@chrishmorris's decimal case) $$ x \stackrel{d}{\sim}{U[0,1]}, \; y = \mbox{decimal part of $10^n x$}, $$ you will find that the sample covariance becomes small as $n$ gets larger.