$\sqrt{n}$-consistency of M-estimator based on plug-in estimator

Question

Note: This is a follow-up on a previous question that was concerned about consistency, but this time seeking $\sqrt{n}$-consistency.

Suppose we estimate a quantity $\theta_0$ by the $\tilde{\theta} = \hat{\theta}(\eta_0)$ that solves the estimating equation

$$S_n(\tilde{\theta}, \eta_0) = 0$$

where $\eta_0$ is a nuisance parameter that is known. Suppose that the assumptions of the M-estimator are satisfied, with

$$\sqrt{n}(\tilde{\theta}-\theta) = Op(1)$$

so that we have $\sqrt{n}$-consistency.

Question: Let suppose we do not know $\eta_0$, but we have a consistent estimator $\hat{\eta}$ of $\eta_0$. If now $\hat{\theta} = \hat{\theta}(\hat{\eta})$, under which condition do we have $\sqrt{n}$-consistency?

I have already established conditions for simple consistency to hold, but $\sqrt{n}$-consistency seems harder.

Guillaume F. · Accepted Answer · 2018-07-10T06:56:51.327

Background: Typically, to ensure $\sqrt{n}$-consistency, we assume

1) $\tilde{\theta} \xrightarrow{p}\ \theta_0$

2) $S(\theta,\eta)$ is differentiable in $\theta$ at $(\theta_0,\eta_0)$ with derivative matrix $\Gamma$ of full rank

3) $|S(\tilde{\theta},\eta_0) - S(\theta_0,\eta_0)| = Op(n^{-1/2}) + op(|\tilde{\theta}-\theta|)$

From 1) and 2), we have a $C(\eta_0) > 0$ such that, with probability tending to one,

$$|S(\tilde{\theta},\eta_0) - S(\theta_0,\eta_0)| \ge C(\eta_0) | \tilde{\theta} - \theta_0 |$$

Which means, with 3)

$$\begin{align} |\tilde{\theta} - \theta_0| &= Op(|S(\tilde{\theta},\eta_0) - S(\theta_0,\eta_0)|) \\ &= Op(n^{-1/2})+ op(|\tilde{\theta}-\theta_0|) = Op(n^{-1/2}) \end{align}$$

which proves the result.

Here 3) can be obtained from a variety of assumptions. Typically, we assume that

4) $S_n(\tilde{\theta},\eta_0) = Op(n^{-1/2})$

5) $S_n(\theta_0, \eta_0 ) = Op(n^{-1/2})$

6) $S(\theta_0, \eta_0) = 0$

as well as an additional more technical assumption. One example is

7a) For any sequence $\delta_n$ with $\delta_n \to 0$, $$\sup_{|\theta - \theta_0| < \delta_n} \frac{| S_n(\theta,\eta_0) - S(\theta, \eta_0) - S_n(\theta_0,\eta_0)|}{n^{-1/2} + |\theta - \theta_0| + |S_n(\theta, \eta_0)| + |S(\theta, \eta_0)|} = op(1)$$

Then under 1, 4-6 and 7a we have 3) true.

Proof: Let $\delta_n$ be a sequence that goes to zero such that

$$P( |\tilde{\theta} - \theta_0| > \delta_n ) \to 0$$

Then, we have, with probability tending to one,

$$ \begin{align} |S(\tilde{\theta},\eta_0)| - |S_n(\tilde{\theta},\eta_0)| - |S_n(\theta_0,\eta_0)| &\le |S_n(\tilde{\theta},\eta_0) - S(\tilde{\theta},\eta_0) - S_n(\theta_0,\eta_0)| \\ &= op(n^{-1/2} + |\tilde{\theta} - \theta_0|) \\ &+ op(|S_n(\tilde{\theta},\eta_0)|) + op(|S(\tilde{\theta},\eta_0)|) \end{align}$$

which gives, from 4) and 5),

$$\begin{align} |S(\tilde{\theta},\eta_0)| &= op(n^{-1/2} + |\tilde{\theta} - \theta_0|) + |S_n(\tilde{\theta},\eta_0)|(1 + op(1)) + op(S(\tilde{\theta},\eta_0)) + |S_n(\theta_0,\eta_0)|\\ &= Op(n^{-1/2}) + op(|\tilde{\theta} - \theta_0|) + op(S(\tilde{\theta},\eta_0)) = Op(n^{-1/2}) + op(|\tilde{\theta} - \theta_0|) \end{align}$$

Instead of 7a), we can instead assume

7b)

$$ [S_n(\tilde{\theta},\eta_0) - S(\tilde{\theta},\eta_0)] - [ S_n(\theta_0,\eta_0) - S(\theta_0,\eta_0)] = Op(n^{-1/2}) + op(|\tilde{\theta} - \theta_0|) $$

Straightforward calculus shows that 7b together with 4-6 implies 3.

When $\eta_0$ is unknown, the resulting $\hat{\theta} = \hat{\theta}(\hat{\eta})$ may satisfy 1) (see linked page) and 2) still holds. However, 3) may not hold.

Solution 1:

One way is to assume that in addition to the previous assumption 2), the estimator $\hat{\theta} = \hat{\theta}(\hat{\eta})$ satisfies

A) $\hat{\theta} \xrightarrow{p}\ \theta_0$

B) $|\hat{\eta} - \eta_0| = Op(n^{-1/2}) $

C) $|S(\hat{\theta},\hat{\eta}) - S(\theta_0,\eta_0)| = Op(n^{-1/2}) + op(|\hat{\theta} - \theta_0|)$

D) $S(\theta,\eta)$ is Lipschitz continuous in $\eta$ in a neighborhood of $\theta_0$ and $\eta_0$

Under A-D, condition 3) is satisfied and thus $\hat{\theta}$ has $\sqrt{n}$-consistency.

Proof: From the Lipschitz continuity we have a $K > 0$ such that with probability tending to one

$$| S(\hat{\theta},\eta_0) - S(\hat{\theta},\hat{\eta})| \le K |\hat{\eta} - \eta_0| = Op(n^{-1/2})$$

which implies

$$\begin{align}|S(\hat{\theta},\eta_0) - S(\theta_0,\eta_0)| &\le |S(\hat{\theta},\hat{\eta}) - S(\theta_0,\eta_0)| + |S(\hat{\theta},\hat{\eta}) - S(\hat{\theta},\eta_0)| \\ &= Op(n^{-1/2}) + op(|\hat{\theta} - \theta_0|)\end{align}$$

Solution 2:

Alternatively, if we assume A) and C) together with

2') $S(\theta,\eta)$ is differentiable in $(\theta,\eta)$ at $(\theta_0,\eta_0)$ with derivative matrix $\Gamma$ of full rank

B') $\hat{\eta} \xrightarrow{p}\ \eta_0$

Then we automatically get our result from the derivation in the background.

dont you need some more informations about the speed of $o_p(\widehat{\theta}-\theta_0)$ (which I suppose here stands just for $o_p(\widehat{\theta}-\theta_0)=o_p(1)$?) — chRrr, Jul 10 '18 at 08:43
Assumption A) is all you need. Is there a specific equation you have in mind? — Guillaume F., Jul 10 '18 at 16:18
I think my problem is due to your compact listing of the assumptions/setting, i.e. the meaning of $\widehat{\theta}-\theta_0 = O_p(n^{-1/2}) +o_p(\widehat{\theta}-\theta_0)$. Suppose $o_p(\widehat{\theta}-\theta_0) = o_p(n^{-1/4})=O_p(n^{-1/4})$, Then $\sqrt{n}(\widehat{\theta}-\theta)=O_p(1) + O_p(n^{1/4})$ and you would have some problems to derive asymptotic normality. — chRrr, Jul 11 '18 at 09:37
It's one of the properties of $op/Op$ calculus that's never taught explicitly in class but can be deduced from first principles. $$ Z_n = R_n + op(Z_n) \implies Z_n(1+op(1)) = R_n \implies Z_n = (1+op(1))^{-1} R_n \implies Z_n = Op(R_n)$$ — Guillaume F., Jul 11 '18 at 14:38

Guillaume F. · Answer 2 · 2018-07-11T21:15:33.670

The other answer doesn't assume that $S_n(\hat{\theta}, \eta_0)$ is differentiable. If we assume $S_n(\hat{\theta}, \eta_0)$ differentiable, our work is simplified somewhat.

Background:

1) $\tilde{\theta} = \theta_0 + op(1)$

2) $S_n(\theta,\eta)$ is equidifferentiable (in probability) in $\theta$ at $(\theta_0,\eta_0)$ with a derivative matrix $\Gamma_n$

3) $\Gamma_n$ in invertible with probability tending to one, with $\Gamma_n^{-1} = Op(1)$

4) $S_n(\tilde{\theta},\eta_0) - S_n(\theta_0,\eta_0) = Op(n^{-1/2})$

With probability tending to one, we can do a Taylor expansion about $\theta_0$,

$$\begin{align} S_n(\tilde{\theta},\eta_0) &= S_n(\theta_0,\eta_0) + \Gamma_n(\tilde{\theta} - \theta_0) + op(\tilde{\theta} - \theta_0) \end{align} $$

Hence

$$\begin{align} \tilde{\theta} - \theta_0 &= Op\left(S_n(\tilde{\theta},\eta_0) - S_n(\theta_0,\eta_0) \right) + op(\tilde{\theta} - \theta_0) = Op(n^{-1/2}) \end{align}$$

Solution:

If we assume A-D,

A) $\tilde{\theta} = \theta_0 + op(1), \hat{\eta} = \eta_0 + op(1) $

B) $S_n(\theta,\eta)$ is uniformly equidifferentiable (in probability) in $\theta$ at $\theta_0$ on a neighborhood $\mathcal{B}$ of $\eta_0$ with a derivative matrix $\Gamma_n(\eta)$

C) On $\mathcal{B}$, $\Gamma_n(\eta)$ is invertible with probability tending to one and $\sup_{\eta \in \mathcal{B}} \Gamma_n^{-1}(\eta) = Op(1)$

D) $S_n(\hat{\theta},\hat{\eta}) - S_n(\theta_0,\hat{\eta}_0) = Op(n^{-1/2}) $

Performing a Taylor expansion about $\theta_0$,

$$\begin{align} S_n(\hat{\theta},\hat{\eta}) &= S_n(\theta_0,\hat{\eta}_0) + \Gamma_n(\hat{\eta})(\hat{\theta} - \theta_0) + op(\hat{\theta} - \theta_0) \end{align} \\ $$

Which implies

$$\begin{align} \hat{\theta} - \theta_0 &= Op\left(S_n(\hat{\theta},\hat{\eta}) - S_n(\theta_0,\hat{\eta}) \right) +op(\hat{\theta} - \theta_0) = Op(n^{-1/2}) \end{align}$$

Which is the result.

$\sqrt{n}$-consistency of M-estimator based on plug-in estimator

2 Answers2

Linked