I have seen two types of results on convergence rates for some estimator $\hat f$: $E\|\hat f - f\|^2 = O(\psi_n)$ and $\|\hat f - f\| = O_p(\sqrt{\psi_n})$. The first result seems to be stronger, because of Markov's inequality. However, it is the the second result that has been established in many classical papers, e.g. Stone's paper on optimal rates of convergence, see http://projecteuclid.org/euclid.aos/1176345206. On the other hand, it is the first result which is usually considered in modern nonparametric textbooks with minimax approach.
Why do people care about the $O_p$ rates? I have intuitive understanding of the problem in terms of the loss function $d(\hat f,f)=\|\hat f - f\|^2$ and its expected value. In particular we know that the conditional mean function solves minimization problem in the L2 norm. However, I have no intuition at all, why should we consider $O_p$ rates, except for the fact that they might be easier to establish.