3

e

The above proof I got from UofT's CS412 lecture slides. So I have a few questions regarding this notation that I don't understand

  • is $x \sim p(\{x^{(r)}\}^R_{r=1})$ supposed to represent $R$ number of samples taken from $p(x)$? Why is it $x \sim$ from this?
  • Does $\phi$ represent the function we are sampling from? If so does $\phi(x^{(r)})$ mean sample $r$ from $\phi$?
  • Why does the 2nd equation become the 3rd and then why does the 3rd become the 4th?
user8714896
  • 640
  • 2
  • 12
  • 2
    Only looking at the formula, it's difficult to know what it means. You'd better link the whole slides or type the context. – hbadger19042 Jun 13 '20 at 04:26

1 Answers1

3

This looks like it is showing that a function $\phi()$ of an observation sampled from the empirical distribution has expected value equal to the expected value of $\phi()$ over the underlying distribution $p(x)$. Which would be relevant to bootstrap estimation.

If so, $x\sim p\left(\{x^{(r)}\}_{r=1}^R\right)$ means that $x$ is sampled with equal probability from the $R$ observations $\{x_1, \dots, x_R\}$.

In the first equality, the average on the right-hand side is the expectation over sampling from $\{x_1, \dots, x_R\}$ of the function $\phi(x)$. The $\mathbb{E}$ is expectation over the distribution $\{x_1, \dots, x_R\}$ came from.

The second equality pulls the sum out through the $\mathbb{E}$, by linearity of expectation.

The third equality is the key. It says that since any $x^{(r)}$ was just sampled from $p(x)$ originally, $\mathbb{E}[\phi(x^{(r)}]$ is just the same as $\mathbb{E}[\phi(x)]$ for any observation from the population distribution $p(x)$.

The final step just notices that you have the average of $R$ identical things.

I don't like the notation here. I tried something a bit different:

Write $P$ for expectations over the population distribution and $\mathbb{P}_R$ for expectations over the empirical distribution.

By definition of the empirical distribution $$\mathbb{P}_R[\phi(X)]\equiv \frac{1}{R}\sum_{r=1}^R\phi(X^{(r)})$$

Now $$P\left[\mathbb{P}_R[\phi(X)]\right] = P\left[\frac{1}{R}\sum_{r=1}^R\phi(X^{(r)})\right]$$ By linearity of expectation $$P\left[\frac{1}{R}\sum_{r=1}^R\phi(X^{(r)})\right]= \frac{1}{R}\sum_{r=1}^RP[\phi(X^{(r)})]$$ And because $X^{(r)}$ was an iid sample $$\frac{1}{R}\sum_{r=1}^R P[\phi(X^{(r)})]= \frac{1}{R}\sum_{r=1}^RP[\phi(X)]=P\phi(X)\equiv\Phi$$

I'm not sure it's an improvement (except for easier typesetting). It's still less obvious than it should be that you can just drop the $r$ superscript.

I worried initially that $x^{(r)}$ was supposed to indicate the sample has been sorted from smallest -- $x^{(1)}$ -- to largest -- $x^{(R)}$. In that case, though, the proof as written is incorrect because the sorting stops the $x^{(r)}$ being exchangeable. The result is still true, for the same reasons, because the empirical distribution forgets the sorting.

Thomas Lumley
  • 21,784
  • 1
  • 22
  • 73