Law of total variance as Pythagorean theorem

Question

Assume $X$ and $Y$ have finite second moment. In the Hilbert space of random variables with second finite moment (with inner product of $T_1,T_2$ defined by $E(T_1T_2)$, $||T||^2=E(T^2)$), we may interpret $E(Y|X)$ as the projection of $Y$ onto the space of functions of $X$.

We also know that Law of Total Variance reads $$Var(Y)=E(Var(Y|X)) + Var(E(Y|X))$$

Is there a way to interpret this law in terms of the geometric picture above? I have been told that the law is the same as Pythagorean Theorem for the right-angled triangle with sides $Y, E(Y|X), Y-E(Y|X)$. I understand why the triangle is right-angled, but not how the Pythagorean Theorem is capturing the Law of Total Variance.

score 7 · Accepted Answer · answered Oct 03 '13 at 03:13

I assume that you are comfortable with regarding the right-angled triangle as meaning that $E[Y\mid X]$ and $Y - E[Y\mid X]$ are uncorrelated random variables. For uncorrelated random variables $A$ and $B$, $$\operatorname{var}(A+B) = \operatorname{var}(A) + \operatorname{var}(B),\tag{1}$$ and so if we set $A = Y - E[Y\mid X]$ and $B = E[Y\mid X]$ so that $A+B = Y$, we get that $$\operatorname{var}(Y) = \operatorname{var}(Y-E[Y\mid X]) + \operatorname{var}(E[Y\mid X]).\tag{2}$$ It remains to show that $\operatorname{var}(Y-E[Y\mid X])$ is the same as $E[\operatorname{var}(Y\mid X)]$ so that we can re-state $(2)$ as $$\operatorname{var}(Y) = E[\operatorname{var}(Y\mid X)] + \operatorname{var}(E[Y\mid X])\tag{3}$$ which is the total variance formula.

It is well-known that the expected value of the random variable $E[Y\mid X]$ is$E[Y]$, that is, $E\biggr[E[Y\mid X]\biggr] = E[Y]$. So we see that $$E[A] = E\biggr[Y - E[Y\mid X]\biggr] = E[Y] - E\biggr[E[Y\mid X]\biggr] = 0,$$ from which it follows that $\operatorname{var}(A) = E[A^2]$, that is, $$\operatorname{var}(Y-E[Y\mid X]) = E\left[(Y-E[Y\mid X])^2\right].\tag{4}$$ Let $C$ denote the random variable $(Y-E[Y\mid X])^2$ so that we can write that $$\operatorname{var}(Y-E[Y\mid X]) = E[C].\tag{5}$$ But, $E[C] = E\biggr[E[C\mid X]\biggr]$ where $E[C\mid X] = E\biggr[(Y-E[Y\mid X])^2{\bigr\vert} X\biggr].$ Now, given that $X = x$, the conditional distribution of $Y$ has mean $E[Y\mid X=x]$ and so $$E\biggr[(Y-E[Y\mid X=x])^2{\bigr\vert} X=x\biggr] = \operatorname{var}(Y\mid X = x).$$ In other words, $E[C\mid X = x] = \operatorname{var}(Y\mid X = x)$ so that the random variable $E[C\mid X]$ is just $\operatorname{var}(Y\mid X)$. Hence, $$E[C] = E\biggr[E[C\mid X]\biggr] = E[\operatorname{var}(Y\mid X)], \tag{6}$$ which upon substitution into $(5)$ shows that $$\operatorname{var}(Y-E[Y\mid X]) = E[\operatorname{var}(Y\mid X)].$$ This makes the right side of $(2)$ exactly what we need and so we have proved the total variance formula $(3)$.

$Y-E(Y|X)$ is a variable with zero mean. Hence $var(Y-E(Y|X))=E[Y-E(Y|X)]^2$. Now $Evar(Y|X)=E[E((Y-E(Y|X))^2|X)]=E[Y-E(Y|X)]^2$. A bit less complicated second part of the answer. — mpiktas, Oct 03 '13 at 07:50
@mpiktas Thanks. I am aware of the shorter way of getting to the desired result but always have difficulty explaining it in a way that beginning students can follow easily. Incidentally, in that last equation you wrote, the quantity on the right has a misplaced exponent: it is the quantity inside the square brackets that should be squared; that is, it should be $E\bigr[(Y-E[Y|X])^2\bigr ]$. Too late to correct it, though, unless a moderator obliges. — Dilip Sarwate, Oct 03 '13 at 11:05
Dilip, many probabilists would correctly interpret @mpiktas's equation as written; the extra set of parentheses are often dropped. Perhaps my eyes are deceiving me, but I think his notation is consistent throughout. I'm happy to help fix things up, if desired, though. :-) — cardinal, Oct 03 '13 at 12:11
@cardinal I didn't misinterpret mpiktas's writing, and fully understood what he was saying. While I am also used to interpreting $EX$ or $\mathbb EX$ as the expected value of $X$, I always have my doubts about $EX^2$, especially since PEMDAS says nothing about it. Does the expectation have priority over the exponentiation or not? I guess I am just used to the expectation operator to apply to everything inside the square brackets. Please don't edit m[iktas's comment, but if you want to delete _everything_ in this thread from "Incidentally" onwards in my previous comment, please go ahead. — Dilip Sarwate, Oct 03 '13 at 18:15
I'm sorry, @Dilip. My intention was not to suggest you didn't understand; I knew you had! I also agree that the notation can lend itself to ambiguities and it's good to point them out when they arise! What I meant was that I thought the second equation in the comment (i.e., $var\ldots$) made clear the convention that was used henceforth. :-) — cardinal, Oct 04 '13 at 01:23

score 3 · Answer 2 · answered Feb 05 '17 at 23:05

Statement:

The Pythagorean theorem says, for any elements $T_1$ and $T_2$ of an inner-product space with finite norms such that $\langle T_1,T_2\rangle = 0$, $$ ||T_1+T_2||^2 = ||T_1||^2 + ||T_2||^2 \tag{1}. $$ Or in other words, for orthogonal vectors, the squared length of the sum is the sum of the squared lengths.

Our Case:

In our case $T_1 = E(Y|X)$ and $T_2 = Y - E[Y|X]$ are random variables, the squared norm is $||T_i||^2 = E[T_i^2]$ and the inner product $\langle T_1,T_2\rangle = E[T_1T_2]$. Translating $(1)$ into statistical language gives us: $$ E[Y^2] = E[\{E(Y|X)\}^2] + E[(Y - E[Y|X])^2] \tag{2}, $$ because $E[T_1T_2] = \operatorname{Cov}(T_1,T_2) = 0$. We can make this look more like your stated Law of Total Variance if we change $(2)$ by...

Subtract $(E[Y])^2$ from both sides, making the left hand side $\operatorname{Var}[Y]$,
Noting on the right hand side that $E[\{E(Y|X)\}^2] - (E[Y])^2 = \operatorname{Var}(E[Y|X])$,
Noting that $ E[(Y - E[Y|X])^2] = E[E\{(Y - E[Y|X])^2\}|X] = E[\operatorname{Var}(Y|X)]$.

For details about these three bullet points see @DilipSarwate's post. He explains this all in much more detail than I do.

Law of total variance as Pythagorean theorem

2 Answers2

Statement:

Our Case:

Linked