Simple question about the asymptotics of estimators

Question

Consider any arbitrary estimator called $\hat{M}$ (e.g., regression coefficient estimator or specific type of correlation estimator, etc) that satisfies the following asymptotic property:

$$\boxed{\sqrt{N}(\hat{M}-M) \overset{d}{\to}\mathcal{N}(0,\sigma^2)}\,\,\,\,\,\,\,\,\,\,\,\,(1)$$

which implies that our $\hat{M}$ is consistent. We also have a consistent estimator $\hat{\sigma}$, which gives rise to the asymptotic property:

$$\displaystyle \ \ \boxed{\frac{\sqrt{N}(\hat{M}-M)}{\hat{\sigma}} \overset{d}{\to}\mathcal{N}(0,1)}\,\,\,\,\,\,\,\,\,\,\,\,(2)$$

I'm wondering if I can use the $z$- or $t$-test just like normal for any such $\hat{M}$ that satisfies the above? Let $Q$ be defined as the test statistic:

$$\displaystyle \ \ \boxed{Q_\hat{M} = \frac{\hat{M}-M_{H_0}}{\sqrt{\frac{1}{N}\hat{\sigma}^2}}}\,\,\,\,\,\,\,\,\,\,\,\,(3)$$

My goal is to do the following hypothesis test:

$H_0: M = 0$

$H_a: M \not= 0$

yet the only information I have access to is $(1)$ and $(2)$, whence my question.

$$\underline{\text{Update}}$$

The current answers suggest that I can't always robustly $z$- or $t$-test for any such $\hat{M}$. I am reading the relevant sections of All of Statistics (Wasserman), as well as Statistical Inference (Casella & Berger). Both state that, if:

$$\displaystyle \ \ \frac{\sqrt{N}(\hat{M}-M)}{\hat{\sigma}} \overset{d}{\to} \mathcal{N}(0,1)$$

then "an approximate test can be based on the wald statistic $Q$ and would reject $H_0$ if.f. $Q < -z_{\alpha/2}$ or $Q > z_{\alpha/2}$" (in Casella & Berger, page 492, "10.3.2 Other Large-Sample Tests")
or, in (Wasserman, page 158, Theorem 10.13) "Let $Q = (\hat{M}-M_{H_0})/\hat{se}$ denote the observed value of the Wald statistic $Q$ $\big($where $\hat{se}$ is obviously equal to my $\sqrt{\frac{1}{N}\hat{\sigma}^2}$$\big)$. The p-value is given by:

$$p = 2\Phi(-|Q|)$$

This contradicts the existing advice since they do not state any other necessary assumptions to be able to do this legitimately (to the best of my ability to comprehend). Either;

I have failed to understand existing answers.
I have failed to express my original question clearly.
I have failed to read these chapters properly.
They are excluding thoroughness for pedagogical purposes.

I would appreciate some assistance on which option is correct. Thanks. $\big($Please go easy I am new to stats :)$\big)$.

Another dimension is that my intended application is $n = 3000$, so perhaps the finite sample problems are less relevant?

@StasK already answered, I will only note that it is not a good idea mathematically to have $N$ dependent quantities on both sides of $\to$. — mpiktas, Dec 13 '12 at 14:37
You are empirically using the proof of the robustness of the t-test ([see](http://stats.stackexchange.com/q/44262/10525)). A good estimator of $\sigma$ typically requires a moderate or large sample size. This in turn implies that the distribution of the T statistic is a Student-$t$ with large degrees of freedom, which is approximately normal. — , Dec 13 '12 at 18:23
@Procrastinator So the answer is *"Yes, you can $t$-test like normal for any such $\hat{M}$"*? — Jase, Dec 14 '12 at 01:11
@Jase My answer is "*They are similar when the sample size is moderate or large but there is not much gain on doing such approximation*". The quality and validity of the approximation is still restricted to some conditions on the sampling distribution, as discussed in the link posted in my previous comment. On the other hand, I think this question reflects a good intuition. — , Dec 14 '12 at 01:53
@Procrastinator What problems can come up if I do a $t$-test like normal? — Jase, Dec 14 '12 at 02:01
@Jase The normal approximation that you are proposing depends on two underlying approximations. The first one is the speed of convergence to the normal distribution in your first equation (*speed* as a function of the sample size). The second one is the speed of convergence of the estimator of $\sigma$. If at least one of these is slow (in the sense that it requires a large sample size to produce an accurate estimation), then the approximation may not be that appealing. — , Dec 14 '12 at 02:07
@Procrastinator Thanks. If $s \to \sigma$ (i.e. estimated std. error $\to$ asymptotic std. error) and $\hat{M} \to M$ very quickly in a simulation study, then is it safe to use the $t$-test as I described? — Jase, Dec 14 '12 at 02:15
Many thanks @Procrastinator for this interesting link. Really. — Elvis, Dec 14 '12 at 09:45
Well, finally I am a bit disappointed by the answer. Anyway, thanks. — Elvis, Dec 14 '12 at 10:02
@Elvis Do you mean the answer in the link I posted? You can also go to the wikipedia link I posted there, which contains the basic idea of the proof, I think. — , Dec 14 '12 at 11:33
@Jase The $t$-test is a bit more conservative than the normal test you are considering. The proposed normal test can be seen as an approximation of the t-test, which is already an approximation (this is explained in StasK's answer). If your estimators $\hat{M}$ and $s$ are as good as you mention, then they will likely provide similar answers. My point is that the normal test can be used (only) in nice scenarios; it will work well but this test is not robust. — , Dec 14 '12 at 11:42
@Procrastinator Okay, so to summarize your contribution: $\text{(i)}$ Standard, everyday $t$-testing is fine for any such $\hat{M}$ in order to test $H_0: M = 0$ versus $H_a: M \not= 0$, and $\text{(ii)}$ $z$-testing is fine only in nice, idealized scenarios. — Jase, Dec 14 '12 at 11:51
Yes, I read all that. In fact as far as I understand the consequence of Slutky lemma is just that the statistic is asymptotically normal. — Elvis, Dec 14 '12 at 11:53
@Procrastinator Also, I'm not sure what you mean by "The $t$-test is a bit more conservative than the normal test you are considering". I am not considering any "normal test". I am considering only the $t$-test. — Jase, Dec 14 '12 at 11:57
@Jase I said "normal test" because you are using a normal approximation to the T statistic. The $t-$test is more conservative in the sense that the denominator is treated as a random variable, not just a number. This is why the resulting distribution (Student-$t$) has heavier tails than the normal ones. — , Dec 14 '12 at 13:02
@Procrastinator Can you tell me what test statistic I should be using for this $\hat{M}$? What would the test statistic look like if I want to work with the Student-$t$ distribution? — Jase, Dec 14 '12 at 15:46
@Jase How are you estimating $\sigma$? Is it the sample variance? If the answer is yes, then the statistic $t_{\hat{M}}$ is approximately distributed as a Student-$t$ with [certain degrees of freedom](http://en.wikipedia.org/wiki/Student%27s_t-test). If not, then you have to obtain the distribution of $\hat{\sigma}$ and after that, obtain the distribution of the ratio $t_{\hat{M}}$. If the variability of $\hat{\sigma}$ is small, then the normal test will work. If the variability is high, then you need the distribution of your statistic which may be complicated but you can employ bootstrap. — , Dec 14 '12 at 15:57
@Procrastinator Well $\hat{\sigma}^2$ is simply the estimator for the asymptotic variance, $\sigma^2$. A simulation study is done and found that the error decreases monotonically as $n$ is increased. — Jase, Dec 14 '12 at 16:11
@Jase Yes, that is always the case because the estimator is consistent. But the variability of the estimator, this is, the variance of the distribution of the estimator may be still big. What you try do is: simulate $N$ samples of size $n$ (the sample size of interest) from the sampling model and calculate $N$ statistics $t_{\hat{M}}$. Plot the histogram of the sample of statistics and check if they look normal. You can go one step further and test for normallity, *e.g.* using `shapiro.test()`. — , Dec 14 '12 at 16:15
@Procrastinator How do I tell the difference between whether it's Student-$t$ distributed (hence I use normal $t$-test) or normally distributed (hence I use $z$-test)? They look the same to the eyeball. — Jase, Dec 14 '12 at 16:17
@Jase As I said "You can go one step further and test for normallity, e.g. using `shapiro.test()`". Another option consists of resampling in order to obtain a bootstrap sample of your test statistic and to use this sample to conduct the hypothesis test. — , Dec 14 '12 at 18:02
I do not see any contradiction between the answers here and Wasserman's Theorem 9.18. Wasserman states that the result is asymptotic and applies to the MLE not any estimator. Therefore: (1) you may need a large sample to get an accurate approximation and (2) the result applies to the MLE *under appropriate regularity conditions*, which does not contradicts the answers below. — , Dec 17 '12 at 14:33

StasK · Accepted Answer · 2012-12-14T20:40:36.333

That's exactly how asymptotic results are being used in practice, e.g., in logistic regression. I would probably factor it differently as

$$\sqrt{N}\frac{\hat{M}-M}\sigma \overset{d}{\to}\mathcal{N}(0,1)$$

which shows the desired result more immediately, IMO (as mptikas mentioned in the comments, it is not kosher to have $N$ on the RHS of the asymptotic expression). The practical problem with this of course is that $\sigma$ is usually unknown, and needs to be estimated. The result, and the application, would still hold if a $\sqrt{N}$-consistent estimator is plugged in place of $\sigma$. In some applications, getting such an estimator is a non-trivial task, as is the case with say dependent data (time-series, cluster sampling, spatial data).

Update: since the asymptotic distribution is the normal rather than Student, a $z$-test is more appropriate. In practice, $t$-tests are often used instead, but coming up with the degrees of freedom is often a challenge. Besides, for most sample statistics, the finite sample asymmetry and bias are greater concerns than heavy tails, and these obviously cannot be corrected by referring the test statistic to the $t$-distribution instead of the standard normal.

It is still unclear to me whether I can do the $t$-test or whether I can't do it (or whether my ability to do it robustly is contingent upon some conditions). — Jase, Dec 14 '12 at 11:56
I updated my answer a little bit to address this question, see above. — StasK, Dec 14 '12 at 20:41

whuber · Answer 2 · 2012-12-14T21:13:16.850

Taking the question at its face value, the answer is no. I offer a counterexample where $\hat{M}$ approaches its estimand in distribution while its variance diverges: in such a case, the $t$ statistic must approach zero almost surely, proving it can have neither an asymptotic Normal or t distribution.

Consider the usual Normal setting where $\hat{M}$ is an unbiased estimator of the mean based on $N \ge 2$ iid observations of a Normal$(\mu, \sigma^2)$ variable, $(X_1, X_2, \ldots, X_N)$. Let $\beta$ be a function of $N$ to be determined later and, writing $\bar{X}$ for the sample mean, consider the estimator

$$\hat{M}(X_1,\ldots,X_N) = \beta(N)\bar{X}\ \text{ if }\ X_1\ge\max(X_1,\ldots,X_N)\ \text{ else }\ \frac{N-\beta(N)}{N-1}\bar{X}.$$

Because the first alternative in the definition of $\hat{M}$ happens with probability $1/N$ and the second with probability $(N-1)/N$, we can compute that

$$\mathbb{E}(\hat{M}) = \mathbb{E}\left(\frac{1}{N}\beta(N)\bar{X}\ + \frac{N-1}{N}\frac{N-\beta(N)}{N-1}\bar{X}\right) = \mathbb{E}(\bar{X}) = \mu,$$

showing that $\hat{M}$ is an unbiased estimator of $\mu$, and (by computing the expectation of $\hat{M}^2$ and subtracting the square of the expectation of $\hat{M}$),

$$\text{Var}(\hat{M}) = \frac{\sigma^2/N + \mu^2}{N(N-1)^2}\left((N-1)^2\beta(N)^2 + (N-1)\left(N-\beta(N)\right)^2\right) - \mu^2.$$

If we choose $\beta(N) = O(N^b)$ for $\frac{1}{2} \lt b \lt 1$, the right hand side (which is $O(N^{2b-1})$) will diverge but $\hat{M}$ will approach $\mu$ in distribution (because most of the time $\hat{M}$ will equal $ \frac{N-\beta(N)}{N-1}\bar{X}$ which is becoming arbitrarily close to $\bar{X}$).

In a comment, StasK has noted that this estimator $\hat{M}$ is not exchangeable in the arguments ($X_1$ plays a favored role) and asks whether that might be part of the cause of the "bad" asymptotic behavior. I do not believe so. For instance, let $s$ be the sample standard deviation and $\bar{X_{\widehat{i}}}$ be the mean of the variables with $X_i$ excluded. The distribution of $(Y_i) = (X_i - \bar{X_{\widehat{i}}})/s)$ depends only on $N$ (not on $\mu$ or $\sigma$)--it is a multivariate distribution with scaled Student t distributions as marginals--so for each $N$ we there exists a number $t_N$ for which there is a $1/N$ chance that $\max(Y_i)\ge t_N$. In the definition of $\hat{M}$, replace the condition $X_1 \ge \max(X_i)$ by $\max{Y_i}\ge t_N$. Everything works out exactly as before, but this $\hat{M}$ is invariant under permutations of the data.

Bill, you probably meant $X_1 \ge \max(\cdots)$ in the first formula. This (counter)example is of course artificial, but still useful to have at hand, as most counterexamples are. What would be the additional requirements to impose on $\hat M(X_1, \ldots, X_N)$ to rule out situations like you've described? Interchangeability of the arguments? — StasK, Dec 14 '12 at 20:45
@StasK Thanks--done. Instead of the $X_1 \ge \max\ldots$ test we could use a randomized procedure, so interchangeability doesn't help. I don't know what a sufficient set of conditions would be to rule out this situation. Although the example is artificial, it models estimators that might once in a while be a little "wild." We might speculate that such estimators are also likely to be inadmissible (as in the case of unbiased estimators for lognormal distributions), so that could be a fruitful direction to look for conditions. — whuber, Dec 14 '12 at 20:59
I don't have a copy of either of those texts in front of me, Jase, but I suspect that they have assumed the estimates are Maximum Likelihood estimates (which is the setting for a Wald statistic), whereas your question asks about an "arbitrary" estimator. — whuber, Dec 16 '12 at 16:35
I trust that your answer was absolutely fantastic, but since I do not have the capability to understand it I will award the correct answer to stask. — Jase, Dec 18 '12 at 16:39
I'll be sure in the future not to provide incomprehensible answers to your questions. — whuber, Dec 18 '12 at 17:25
I think it is good to get a broad range of complexity in the answers if the goal is to make CV into a database of authoritative answers catering to a wide audience. — Jase, Dec 19 '12 at 07:28

Simple question about the asymptotics of estimators

2 Answers2