9

Given two highly correlated random variables $X$ and $Y$, I'd like to bound the probability that the difference $ |X - Y| $ exceeds some amount: $$ P( |X - Y| > K) < \delta $$

Assume for simplicity that:

  • The correlation coefficient is known to be "high", say : $ \rho_{X,Y}= {covar(X,Y)} / {\sigma_X \sigma_Y} \geq 1 - \epsilon $

  • $X,Y $ are zero mean: $ \mu_x = \mu_y = 0 $

  • $-1 \leq x_i, y_i \leq 1$ (or $ 0 \leq x_i, y_i \leq 1$ if that's any easier)

  • (If it makes things easier, let's say $X,Y $ have identical variance: $\sigma_X^2 = \sigma_Y^2 $)

Not sure how feasible it is to derive a bound on the difference given only the above information (I certainly couldn't get anywhere). A specific solution (if any), mandatory additional restrictions to impose on the distributions, or just advice on an approach would be great.

Avanti89
  • 115
  • 5

1 Answers1

9

Even without those simplifying assumptions, a bound can be obtained by combining a couple of usual tools:

In some detail:

$$\sigma^2_{X-Y}=\sigma^2_X+\sigma^2_Y-2·cov(X,Y)$$

$$cov(X,Y)=\sigma_X·\sigma_Y·\rho_{XY}$$

$$\sigma^2_{X-Y}=\sigma^2_X+\sigma^2_Y-2·\sigma_X·\sigma_Y·\rho_{X,Y}$$

According to Chebyshev's inequality, for any random variable $Z$:

$$ \Pr(|Z-\mu|\geq k\sigma) \leq \frac{1}{k^2}$$

Then (and using that $\mu_{X-Y}=\mu_X-\mu_Y)$:

$$ \Pr(|X-Y-\mu_X+\mu_Y|\geq k·\sqrt{\sigma^2_X+\sigma^2_Y-2·\sigma_X·\sigma_Y·\rho_{X,Y}}) \leq \frac{1}{k^2}$$

We can use the proposed simplifying assumptions to get a simpler expression. When:

$$\rho_{X,Y}= {covar(X,Y)} / {\sigma_X \sigma_Y} = 1 - \epsilon $$ $$\mu_x = \mu_y = 0$$ $$\sigma_X^2 = \sigma_Y^2 = \sigma^2$$

Then:

$$\sigma^2_X+\sigma^2_Y-2·\sigma_X·\sigma_Y·\rho_{X,Y} = 2·\sigma^2·(1-(1-\epsilon)) = 2\sigma^2\epsilon$$

And therefore:

$$\Pr(|X-Y|\geq k·\sigma\sqrt{2\epsilon}) \leq \frac{1}{k^2}$$

Interestingly, this result holds even if $\epsilon$ is not small, and if the condition for correlation changes from $=1-\epsilon$ to $\geq 1-\epsilon$, the result doesn't change (because it's already an inequality).

Pere
  • 5,875
  • 1
  • 13
  • 29