Is there some good references of the method Median of Means Estimator?

Question

I can't find the term in Wikipedia, and by searching online, I seem to know the original material raised up the idea Median of Means Estimator is this book, while I can only find the scanned edition of this book, so I can't directly search the keywords, and the content seems doesn't offer more useful information where Median of Means Estimator might be raised.

I mainly want to know one theorem which is related to the Median of means estimator, which I read in this paper(Theorem S1 near Eq. (S13) ) is different from the little things I found online. I stated the theorem below for convenient reference:

Let $X$ be a random variable with variance $\sigma^2$. Then, $K$ independent sample means of size $N=34\sigma^2/\epsilon^2$ suffice to construct a median of means estimator $\hat{\mu}(N,K)$ that obeys $Pr[|\hat{\mu}-\mathbb{E}[X]|\ge\epsilon]\le2e^{-K/2}$ for all $\epsilon > 0$.

So is there some formal statement and proof of this kind of theorem of Median of Means Estimator?

It is always annoying when there is a reference to a book without specification of the page or section. With some googling you can find out that it is page 243 from the book. However, I can't help you much further. While the book is a translation into English, the mathematics reads like Russian to me. (See several articles that link to page 243 https://scholar.google.com/scholar?q=nemirovsky+"median+of+means"+"p.+243") — Sextus Empiricus, Oct 02 '21 at 11:27
The inequality is obviously not holding for every $\epsilon\ge 0$... — Xi'an, Oct 02 '21 at 12:35

score 2 · Accepted Answer · answered Oct 03 '21 at 22:41

The first paper cited below includes a review and proof of the median-of-means estimator. The second paper (whose main goal is to show an optimal sample size for mean estimation) also mentions this estimator (near end of section 2.1).

REFERENCES:

Devroye, L., Lerasle, M., et al. "Sub-Gaussian mean estimators", https://arxiv.org/pdf/1509.05845.pdf
Kunsch, R.J., Novak, E. And Rudolf, D., 2019. Solvable integration problems and optimal sample size selection. Journal of Complexity, 53, pp.40-67.

Irtaza · Answer 2 · 2021-11-24T01:38:40.450

If you carry on just a bit further (remarks to theorem 1), they attest to intentionally picking a less tight bound as a worst case...

You can get a tighter bound from here from Proposition 1,

${\rm Pr}(|\hat{\mu}-\mu| > \epsilon) \leq \exp{\left\{-2K\left(\frac{1}{2}-\frac{\sigma^2}{N \epsilon^2}\right)^2\right\}} \leq 2\exp{\left\{-K/2\right\}} \leq \delta/M$

for finite $\sigma^2$ and for some $K=2\log(2M/\delta)$ for $1 \geq \delta \geq 0$

Sextus Empiricus · Answer 3 · 2021-10-04T15:09:09.940

Short approximate derivation

For a single sample mean $X_N$ you have the Chebyshev's inequality

$$P(\vert X_N -\mu_X \vert \geq k\frac{\sigma_X}{\sqrt{N}}) \leq \frac{1}{k^2}$$

or

$$P(\vert X_N -\mu_X \vert \geq \epsilon ) \leq \frac{\sigma_X^2}{N\epsilon^2} = \frac{1}{34}$$

The last equality stems from setting $N=34\sigma_X^2/\epsilon^2$.

The probability for the median of a sample of $K$ variables to be outside some range is less probable than the probability of at least half of the individual $K$ variables being outside the range. This is because the latter is a necessary requirement for the former.

This is similar to $P(Y \geq (K-1)/2 )$ if $K$ is odd, or $P(Y \geq K/2 )$ if $K$ is even, with $Y \sim Binom (p = 1/34, n = K)$.

This can be expressed with some limits for the tail of a binomial distribuion

Hoeffding's inequality

$$P(Y \geq (K-1)/2) \leq \text{exp}\left(- 2K\left(\frac{33}{34} - \frac{(K-3)/2}{K}\right)^2\right) =\text{exp} \left(- 2K \left(\frac{16}{34} -1.5/K\right)^2 \right) \approx \text{exp} \left(- 0.44 N + 2.82 - 2.25/N \right) \approx 16.84 \text{exp} \left(- 0.44 N \right) $$

This is not exactly $2 \text{exp}\left(-0.5 K \right)$ and less tight.

With Chernoff bound

$$P(Y \geq (K-1)/2) \approx P(Y \geq K/2+1) \leq \text{exp}\left(- K (0.5 \log(0.5 \cdot 34)+0.5 \log(0.5 \cdot 34 /33)) \right) \approx \text{exp}\left(- 1.085 K \right) $$

this is a tighter bound than $2 \text{exp}\left(-0.5 K \right)$

Simulations show that this tighter bound is not unreasonable.

Is there some good references of the method Median of Means Estimator?

3 Answers3

Short approximate derivation