0

The bootstrap estimates bias by applying the "plug-in" principle to $$E(\hat{\theta}_n) - \theta$$ I got this knowledge from p.124 of Efron, Tibshirani, 1994.
equation(10.1) $\text{bias}_F=E_F[s(\mathbf{x})] -t(F)$ and equation(10.2) $\text{bias}_\hat{F} = E_\hat{F}[s(\mathbf{x}^*)] - t(\hat{F})$

the "plug-in" of the first term has definitive meaning, since $\hat\theta_n$ is a prescribed statistics on the sample. We simply need to take the expectation of it with respect to the empirical distribution.

the "plug-in" of the second term however is rather confusing. Since there are an infinite number of ways to write a given distribution parameter as a functional of the distribution. For example, take $\theta$ to be the decay rate parameter of the exponential distribution, thus $f(X) = \theta e^{-\theta X}$, and one could get $\theta = 1/E(X)$ as well as $\theta = 1/\sqrt{D^2(X)}$, these are different functionals of $f(X)$ and would lead to different plug-in estimate of $\theta$ on finite sample.

This might be a naive question but I hope I've clarified my confusion.


Since I didn't get an answer after half a year, I though maybe I mis-stated the question, So I'll restate it: What's the definition of a "parameter" as repeatedly used in Efron's book? Is it a functional by definition? Or is there a standardized way to write every "parameter" as a functional? Can you give some more examples of a "parameter" (other than the "mean" and "variance")?

Quoted from page 124 of "A introduction to the bootstrap":

... . We want to estimate a real-valued parameter $\theta = t(F)$. For now we will take the estimator to be any statistic $\hat{\theta}=s(x)$

1 Answers1

0

This is best explained using the figure from the Second Thoughts on the Bootstrap paper by Bradley Efron.

enter image description here

In real world, there is a distribution $P$ (denoted as $F$ in the book), where the data $\mathbf{x}$ comes from the distribution. We also have the estimator $t$, using it we can obtain population parameter $\theta = t(P)$. We also can use statistic $s$ to estimate $\theta$ from the sample obtaining $\hat{\theta} = s(\mathbf{x})$.

In bootstrap world, we obtain a sample $\mathbf{x}^*$ from bootstrap distribution $\hat{P}$, we can as well obtain the population estimate of the bootstrap distribution $t(\hat{P})$, or sample statistic from the bootstrap sample $s(\mathbf{x}^*)$.

The plug-in principle means that you substitute the distribution $P$ with bootstrap distribution $\hat{P}$, and sample $\mathbf{x}$ with bootstrap sample $\mathbf{x}^*$. We are allowed to do this because bootstrap imitates sampling $\mathbf{x}$ from $P$, by applying equivalent sampling procedure to sample $\mathbf{x}^*$ from $\hat{P}$.

What follows, if we have the definition of bias

$$ \text{bias}_P(\hat{\theta}, \theta) = \text{bias}_P = E_P[s(\mathbf{x})] - t(P) $$

then we substitute

$$ \text{bias}_\hat{P} = E_\hat{P}[s(\mathbf{x}^*)] - t(\hat{P}) $$

To give an example, say that you want to assess the bias of sample mean $s(\mathbf{x})$ as an estimator of population mean $t(P)$, then you use bootstrap to sample from the distribution $\hat{P}$ (i.e. sample with replacement from $\mathbf{x}$), calculate the sample means on the bootstrap samples $s(\mathbf{x}^*)$, and compare their expected value (mean), with the population mean of the bootstrap distribution $t(\hat{P})$. The "population mean" from bootstrap distribution is equivalent to sample mean of $\mathbf{x}$, since distribution $\hat{P}$ is created by sampling from $\mathbf{x}$, so $\hat{\theta} = s(\mathbf{x}) = t(\hat{P})$.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • p. 125 Efron 1994, equation (10.2), what then does the $t(\hat F)$ mean? – Xavier Xing Dec 12 '18 at 11:04
  • $\text{bias}_F=E_F[s(\mathbf{x})] -t(F)$ – Xavier Xing Dec 12 '18 at 12:43
  • I've copied the two equations in Efron's book, it seems there's a plug-in for $\theta$ ( or $t(F)$ in Efron's notation ) after all – Xavier Xing Dec 12 '18 at 12:49
  • So we don't estimate bias of things we don't know, so does this mean that Efron's $t(\hat F)$ is not generally accepted by statisticians? – Xavier Xing Dec 13 '18 at 05:52
  • you said "the $θ=t(F)$ value that we know", you didn't mention any $t(\hat F)$ as appeared in Efron's book. – Xavier Xing Dec 13 '18 at 07:02
  • @XavierXing my answer missed the point, I made some edits, this should answer your question. – Tim Dec 18 '18 at 13:32
  • This still doesn't answer my confusion. Consider my example of estimating the parameter of an exponential distribution, there are several ways to define what you call a 'population estimate' of the parameter (I've mentioned two in my question), none of them is favorable over the other. Which should we use for $t(\hat P)$ – Xavier Xing Dec 19 '18 at 10:13