Help understanding the proof that Monte Carlo methods that the expectation of $R$ samples is the expectation of the function

Question

The above proof I got from UofT's CS412 lecture slides. So I have a few questions regarding this notation that I don't understand

is $x \sim p(\{x^{(r)}\}^R_{r=1})$ supposed to represent $R$ number of samples taken from $p(x)$? Why is it $x \sim$ from this?
Does $\phi$ represent the function we are sampling from? If so does $\phi(x^{(r)})$ mean sample $r$ from $\phi$?
Why does the 2nd equation become the 3rd and then why does the 3rd become the 4th?

Only looking at the formula, it's difficult to know what it means. You'd better link the whole slides or type the context. — hbadger19042, Jun 13 '20 at 04:26

score 3 · Answer 1 · answered Jun 13 '20 at 05:36

This looks like it is showing that a function $\phi()$ of an observation sampled from the empirical distribution has expected value equal to the expected value of $\phi()$ over the underlying distribution $p(x)$. Which would be relevant to bootstrap estimation.

If so, $x\sim p\left(\{x^{(r)}\}_{r=1}^R\right)$ means that $x$ is sampled with equal probability from the $R$ observations $\{x_1, \dots, x_R\}$.

In the first equality, the average on the right-hand side is the expectation over sampling from $\{x_1, \dots, x_R\}$ of the function $\phi(x)$. The $\mathbb{E}$ is expectation over the distribution $\{x_1, \dots, x_R\}$ came from.

The second equality pulls the sum out through the $\mathbb{E}$, by linearity of expectation.

The third equality is the key. It says that since any $x^{(r)}$ was just sampled from $p(x)$ originally, $\mathbb{E}[\phi(x^{(r)}]$ is just the same as $\mathbb{E}[\phi(x)]$ for any observation from the population distribution $p(x)$.

The final step just notices that you have the average of $R$ identical things.

I don't like the notation here. I tried something a bit different:

Write $P$ for expectations over the population distribution and $\mathbb{P}_R$ for expectations over the empirical distribution.

By definition of the empirical distribution $$\mathbb{P}_R[\phi(X)]\equiv \frac{1}{R}\sum_{r=1}^R\phi(X^{(r)})$$

Now $$P\left[\mathbb{P}_R[\phi(X)]\right] = P\left[\frac{1}{R}\sum_{r=1}^R\phi(X^{(r)})\right]$$ By linearity of expectation $$P\left[\frac{1}{R}\sum_{r=1}^R\phi(X^{(r)})\right]= \frac{1}{R}\sum_{r=1}^RP[\phi(X^{(r)})]$$ And because $X^{(r)}$ was an iid sample $$\frac{1}{R}\sum_{r=1}^R P[\phi(X^{(r)})]= \frac{1}{R}\sum_{r=1}^RP[\phi(X)]=P\phi(X)\equiv\Phi$$

I'm not sure it's an improvement (except for easier typesetting). It's still less obvious than it should be that you can just drop the $r$ superscript.

I worried initially that $x^{(r)}$ was supposed to indicate the sample has been sorted from smallest -- $x^{(1)}$ -- to largest -- $x^{(R)}$. In that case, though, the proof as written is incorrect because the sorting stops the $x^{(r)}$ being exchangeable. The result is still true, for the same reasons, because the empirical distribution forgets the sorting.

Help understanding the proof that Monte Carlo methods that the expectation of $R$ samples is the expectation of the function

1 Answers1