5

I am reading a Text about Single Index Models (SIM), where a SIM is defined as

$E[Y|X=x] = G(X' \beta)$,

with $G$ and $\beta$ unknown. After proposing an estimator for the function $G$, the following statement is given

$\sqrt{n h_n}[G_n(z) - G_n^{*}(z)] = o_p(1) \qquad (1)$.

Here, $G_n^{*}$ denotes the function in case $\beta$ is known and $G_n$ a function where an estimate of $\beta$ was plugged in. Then, this statement is proven and the proof ends with the line

$\sqrt{n h_n}[G_n(z) - G_n^{*}(z)] = O_p(\sqrt{h_n}) \qquad (2)$.

I do not understand why $(2)$ implies $(1)$, i.e. I do not understand the big/litte $O_p$/$o_p$ notation fully. So, why do I know that if $(2)$ holds, $(1)$ must hold as well?

I know $o_p(1)$ means convergence in probability and $O_p$ says something about stochastic boundedness. However, the Wikipedia article says one cannot infer from stochastic boundedness to convergence in probability.

If someone is interested in seeing the full proof, one can click here and go to page 12 and 13 of the PDF.

random_guy
  • 2,262
  • 1
  • 18
  • 30

2 Answers2

4

You should start from a basic understanding of the non-stochastic versions, namely, $o(h_n)$ and $O(h_n)$, rather than the stochastic versions. Once you understand the non-stochastic, the properties of the stochastic versions will be much more transparent. We write $x_n = o(h_n)$ if $x_n / h_n \to 0$, and we write $x_n = O(h_n)$ if $|x_n| \le K h_n$ for some $K$ (provided we take $n$ sufficiently large). Now, we should note that if $x_n = O(h_n)$, and $h_n \to 0$ it is clear that $x_n = o(1)$, i.e., $x_n \to 0$. This is true because we can sandwich $x_n$ between $0$ and $K h_n$, and we know that $K h_n \to 0$. This is what is going on in your example, it just happens to be the stochastic variant rather than the non-stochastic variant.

Most of the properties of the stochastic variants are the same as the non-stochastic variants. Now, $X_n = o_P(h_n)$ means $X_n / h_n \to 0$ in probability, while $X_n = O_P(h_n)$ means that $P(|X_n| \le K h_n) \ge 1 - \epsilon$, where I give you $\epsilon$ and you need to choose an appropriate $K$. It can be shown, for example, that if $h_n \to 0$ and $X_n = O_P(h_n)$, then $X_n = o_P(1)$, i.e. $X_n \to 0$ in probability, which is what you want in your example. To see this, note that for any $(\delta, K)$ then $K h_n \le \delta$ holds for sufficiently large $n$. So, if you give me an arbitrary $\delta$ and $\epsilon$, then I can get a $K$ such that, for sufficiently large $n$, we have $$ P(|X_n| \le \delta) \ge P(|X_n| \le K h_n) \ge 1 - \epsilon. $$ Hence $X_n \to 0$ in probability.

(EDIT: I'm taking $h_n \to 0$ as a deterministic sequence, as in the example; my argument is a little sloppy if $h_n \to 0$ in probability, as a sequence of random variables.)

guy
  • 7,737
  • 1
  • 26
  • 50
1

We have $\sqrt{h_n} = o_p(1)$. Note that $\sqrt{h_n}O_p(1) = O_p(\sqrt{h_n})$. Using the result that $o_p(1)O_p(1)=o_p(1)$. We have the conclusion that $O_p(\sqrt{h_n}) = \sqrt{h_n}O_p(1) = o_p(1)$.

semibruin
  • 1,173
  • 6
  • 15