Convergence rate: $E\|\hat f - f\|^2 = O(\psi_n)$ vs $\|\hat f - f\| = O_p(\psi_n^{1/2})$

Question

I have seen two types of results on convergence rates for some estimator $\hat f$: $E\|\hat f - f\|^2 = O(\psi_n)$ and $\|\hat f - f\| = O_p(\sqrt{\psi_n})$. The first result seems to be stronger, because of Markov's inequality. However, it is the the second result that has been established in many classical papers, e.g. Stone's paper on optimal rates of convergence, see http://projecteuclid.org/euclid.aos/1176345206. On the other hand, it is the first result which is usually considered in modern nonparametric textbooks with minimax approach.

Why do people care about the $O_p$ rates? I have intuitive understanding of the problem in terms of the loss function $d(\hat f,f)=\|\hat f - f\|^2$ and its expected value. In particular we know that the conditional mean function solves minimization problem in the L2 norm. However, I have no intuition at all, why should we consider $O_p$ rates, except for the fact that they might be easier to establish.

score 2 · Accepted Answer · answered Mar 07 '15 at 12:31

I am not entirely sure this answers your question, but my take would be that people care about $O_p$ because it relates to consistency of an estimator $\hat f$ for some $f$, i.e. that $\hat f$ converges in probability at some rate to something (hopefully, the object the estimator is to estimate): $\hat f\to_pf$.

As you say, the other statement is stronger and may hence be used to show what is to be shown. A basic version of one of the main uses of this implication uses Tschebychev's inequality:

$$P\{|z_{n}-\mu|>\epsilon\}\leqslant\frac{E|z_{n}-\mu|^2}{\epsilon^2}$$

By assumption, $\lim_{n\rightarrow\infty}E|z_{n}-\mu|^{2}=0$ if $\psi_n$ is a sequence going to 0, and hence $z_n\to_p\mu$.

Maybe surprisingly, it often turns out that the stronger statement is easier to show.

A very basic, univariate fully parametric textbook example maybe helps to make the point:

Consider $\bar X$ as an estimator, computed from a random sample $X_i$, $i=1,\ldots,n$, for the mean $\mu$ of some population. By unbiasedness, we know that

$$E(\bar X-\mu)^2=Var(\bar X)=\frac{\sigma^2}{n}=O(n^{-1}),$$

which provides a very direct proof of the WLLN, $\bar X\to_p\mu$.

Moreover, given your result, this tells us that the sample mean is a $\sqrt{n}$-consistent estimator of $\mu$, i.e. $\bar X-\mu=O_p(n^{-1/2})$.

Ah, OK. As in the examples above, $\psi_n$ will often be some negative power of $n$, the leading case (at least in the parametric world) being $\psi_n=n^{-1/2}$. This then reads as "the difference between $\hat f$ and $f$ remains stochastically bounded even if you divide by $n^{-1/2}$, i.e. if you multiply by $n^{1/2}$." As $n^{1/2}\to\infty$ this must mean that $||\hat f-f||=o_p(1)$ as, if the product is bounded by assumption ($||\hat f-f||=O_p(n^{-1/2})\Leftrightarrow \sqrt{n}||\hat f-f||=O_p(1)$) and the product contains one term that goes to infinity, the other term must go to zero. — Christoph Hanck, Mar 07 '15 at 17:04

Convergence rate: $E\|\hat f - f\|^2 = O(\psi_n)$ vs $\|\hat f - f\| = O_p(\psi_n^{1/2})$

1 Answers1