18

I think the following two formulas are true:

$$ \mathrm{Var}(aX)=a^2 \mathrm{Var}(X) $$ while a is a constant number $$ \mathrm{Var}(X + Y)=\mathrm{Var}(X)+\mathrm{Var}(Y) $$ if $X$, $Y$ are independent

However, I am not sure what is wrong with the below:

$$\mathrm{Var}(2X) = \mathrm{Var}(X+X) = \mathrm{Var}(X) + \mathrm{Var}(X) $$ which does not equal to $2^2 \mathrm{Var}(X)$, i.e. $4\mathrm{Var}(X)$.

If it is assumed that $X$ is the sample taken from a population, I think we can always assume $X$ to be independent from the other $X$s.

So what is wrong with my confusion?

Silverfish
  • 20,678
  • 23
  • 92
  • 180
lanselibai
  • 497
  • 2
  • 5
  • 14

2 Answers2

35

$\DeclareMathOperator{\Cov}{Cov}$ $\DeclareMathOperator{\Corr}{Corr}$ $\DeclareMathOperator{\Var}{Var}$

The problem with your line of reasoning is

"I think we can always assume $X$ to be independent from the other $X$s."

$X$ is not independent of $X$. The symbol $X$ is being used to refer to the same random variable here. Once you know the value of the first $X$ to appear in your formula, this also fixes the value of the second $X$ to appear. If you want them to refer to distinct (and potentially independent) random variables, you need to denote them with different letters (e.g. $X$ and $Y$) or using subscripts (e.g. $X_1$ and $X_2$); the latter is often (but not always) used to denote variables drawn from the same distribution.

If two variables $X$ and $Y$ are independent then $\Pr(X=a|Y=b)$ is the same as $\Pr(X=a)$: knowing the value of $Y$ does not give us any additional information about the value of $X$. But $\Pr(X=a|X=b)$ is $1$ if $a=b$ and $0$ otherwise: knowing the value of $X$ gives you complete information about the value of $X$. [You can replace the probabilities in this paragraph by cumulative distribution functions, or where appropriate, probability density functions, to essentially the same effect.]

Another way of seeing things is that if two variables are independent then they have zero correlation (though zero correlation does not imply independence!) but $X$ is perfectly correlated with itself, $\Corr(X,X)=1$ so $X$ can't be independent of itself. Note that since the covariance is given by $\Cov(X,Y)=\Corr(X,Y)\sqrt{\Var(X)\Var(Y)}$, then
$$\Cov(X,X)=1\sqrt{\Var(X)^2}=\Var(X)$$

The more general formula for the variance of a sum of two random variables is

$$\Var(X+Y) = \Var(X) + \Var(Y) + 2 \Cov(X,Y)$$

In particular, $\Cov(X,X) = \Var(X)$, so

$$\Var(X+X) = \Var(X) + \Var(X) + 2\Var(X) = 4\Var(X)$$

which is the same as you would have deduced from applying the rule

$$\Var(aX) = a^2 \Var(X) \implies \Var(2X) = 4\Var(X)$$


If you are interested in linearity, then you might be interested in the bilinearity of covariance. For random variables $W$, $X$, $Y$ and $Z$ (whether dependent or independent) and constants $a$, $b$, $c$ and $d$ we have

$$\Cov(aW + bX, Y) = a \Cov(W,Y) + b \Cov(X,Y)$$

$$\Cov(X, cY + dZ) = c \Cov(X,Y) + d \Cov(X,Z)$$

and overall,

$$\Cov(aW + bX, cY + dZ) = ac \Cov(W,Y) + ad \Cov(W,Z) + bc \Cov(X,Y) + bd \Cov(X,Z)$$

You can then use this to prove the (non-linear) results for variance that you wrote in your post:

$$\Var(aX) = \Cov(aX, aX) = a^2 \Cov(X,X) = a^2 \Var(X)$$

$$ \begin{align} \Var(aX + bY) &= \Cov(aX + bY, aX + bY) \\ &= a^2 \Cov(X,X) + ab \Cov(X,Y) + ba \Cov (X,Y) + b^2 \Cov(Y,Y) \\ \Var(aX + bY) &= a^2 \Var(X) + b^2 \Var(Y) + 2ab \Cov(X,Y) \end{align} $$

The latter gives, as a special case when $a=b=1$,

$$\Var(X+Y) = \Var(X) + \Var(Y) + 2 \Cov(X,Y)$$

When $X$ and $Y$ are uncorrelated (which includes the case where they are independent), then this reduces to $\Var(X+Y) = \Var(X) + \Var(Y)$. So if you want to manipulate variances in a "linear" way (which is often a nice way to work algebraically), then work with the covariances instead, and exploit their bilinearity.

Silverfish
  • 20,678
  • 23
  • 92
  • 180
  • 2
    Yes! I think you pinpointed at the beginning that the confusion was essentially a notational one. I found it very helpful when one book (very explicitly, some might say laboriously) explained the interpretation of and rules of evaluating a probabilistic statement (so that, e.g., even if you know what you mean by $\Pr (X+X=n)$ where $X \sim \text{Uniform}(1..6)$, it is technically *incorrect* if you're considering throwing a $n$ in craps (and $X+X=2X$ would never yield an odd roll); the event would be properly expressed using $X_1,X_2$ i.i.d.). – Vandermonde Dec 04 '15 at 18:53
  • 1
    This is in contrast to (and I think my misapprehension might have stemmed from) how `2+PRNG(6)+PRNG(6)` often **is** how you would toss dice as above and/or notation/conventions such as $2 \text{d}6 = \text{d}6 + \text{d}6$ in which different instances are genuinely intended to be independent. – Vandermonde Dec 04 '15 at 18:53
  • 1
    @Vandermonde That's an interesting point. I initially considered mentioning the use of subscripts to distinguish between "different $X$s" but didn't bother - think I might edit it in now. The argument that "you'd never get an odd total score if the sum was $2X$" is very clear and convincing to someone who can't see the need to distinguish: thanks for sharing it. – Silverfish Dec 04 '15 at 19:09
2

Another way of thinking about it is that with random variables $2X \neq X + X$.

$2X$ would mean two times the value of the outcome of $X$, while $X + X$ would mean two trials of $X$. In other words, it's the difference between rolling a die once and doubling the result, vs rolling a die twice.

BBrooklyn
  • 31
  • 1