Algebra of expectations in the MSE Decomposition

Question

In the MSE decomposition formula why does the following hold?

$ {\begin{aligned}{E} _{\theta }\left[2\left({\hat {\theta }}-\operatorname {E} _{\theta }[{\hat {\theta }}]\right)\left(\operatorname {E} _{\theta }[{\hat {\theta }}]-\theta \right)\right]+\operatorname {E} _{\theta }\left[\left(\operatorname {E} _{\theta }[{\hat {\theta }}]-\theta \right)^{2}\right] &=& \\2\left(\operatorname {E} _{\theta }[{\hat {\theta }}]-\theta \right)\operatorname {E} _{\theta }\left[{\hat {\theta }}-\operatorname {E} _{\theta }[{\hat {\theta }}]\right]+\left(\operatorname {E} _{\theta }[{\hat {\theta }}]-\theta \right)^{2}&&\\\\\end{aligned}}$

I know we already have this question here which clearly explains that $\mathbb{E}[\mathbb{E}[\hat{\theta}] - \hat{\theta}]$ is 0 since:

$\mathbb{E}[\mathbb{E}[\hat{\theta}] - \hat{\theta}] = \mathbb{E}[\mathbb{E}[\hat{\theta}]] + \mathbb{E}[\hat{\theta}] = \mathbb{E}[\hat{\theta}] - \mathbb{E}[\hat{\theta}] = 0$

But that alone doesn't seem to explain all the steps that actually took place. How does one derive the bottom equation from the first?

Thanks @SextusEmpiricus it's verbatim from the Wikipedia page (I just added a link). Maybe that's because $\theta$ is assumed to be a random variable? — Josh, May 30 '20 at 16:55

Sextus Empiricus · Answer 1 · 2020-05-30T21:09:40.227

In that Wikipedia page people have been messing up the equations with many edits and eventually adding subscripts which makes it not more clear and too much cluttered.

That subscript started as $E_\theta$. Somebody changed it into $E_{\hat\theta}$. And now it is changed back into $E_\theta$ with the explanation

expectation given theta not hat{theta}

That is not the typical use of subscripts with an expectation operator.

My understanding of the subscript is as

$$E_Y(g(Y)) = \int_{\forall Y} g(y) f_Y(y) dy$$

(where $f_Y(y)$ is the probability density of $Y$) and then it is the expectation of the value of $g(Y)$ where $Y$ is the variable, and we integrate over the probabilities for all different $Y$.

(Or at least it is not what I am used to and not the expectation that I expected. But there seem to be multiple uses for the subscript. Anyway it is at least ambiguous and should be clarified in the text)

With the different interpretation, $E_{\theta}(X)$ as the expectation of $X$ given $\theta$ (ie. conditional on $\theta$), then it is like the previous question here on SE (the one the OP linked to), where in my answer I also commented that it is to be considered conditional on $\theta$ (a comment that was made because some people where confused about what was constant or not).

Algebra of expectations in the MSE Decomposition

1 Answers1

Linked