1

I was studying about wald statistics in wikipedia and it states:

$Z=\displaystyle\frac{(\hat{\theta}-\theta_0)}{se(\hat\theta)}$ $\rightarrow N(0,1)$ as the sample size $n$, which $\hat\theta$ was evalued in, goes to $\infty$ and $se$ is the standard deviation.

That's the $Z$ distribution under the null hypotesis $H_0: \theta=\theta_0$ vs the alternative hypotesis $H_1: \theta \neq \theta_0$, the wikipedia tells us that the such approach generally holds for the most of cases but I'd like to know if :

$Z=\displaystyle\frac{(\hat\theta-\theta)}{\hat{se}(\hat\theta)} \rightarrow N(0,1)$

holds for "the most of cases", regardless if the wald statistic is $N(0,1)$ with large $n$ or not, if so then when would it hold ? That's what are the "weak assumptions" (as this is supposed to hold in the most of time) which leads $Z$ to be $N(0,1)$? Does that anything to do with wald approach? or would this be called as 'wald anything' just under hypotesis testing?

Davi Américo
  • 737
  • 1
  • 11
  • 1
    Is your question about the conditions that the second equation holds assuming that the first one holds? Or is the question also about when the first equation holds? – Sextus Empiricus Jan 11 '22 at 07:25
  • The first equation is the idea which I went over to make my question, but the question is mainly regarding the second equation but if the answerer feels like the one may be answering when the first one holds. I guess answering when the second one does tells me when the first one holds. – Davi Américo Jan 11 '22 at 20:48

1 Answers1

4

As long as $\hat{se}(\hat{\theta})$ is a consistent estimator of $se(\hat{\theta})$, then by Slutsky's theorem it may be shown that \begin{eqnarray*} \frac{\hat{\theta}-\theta}{\hat{se}(\hat{\theta})} \rightarrow \mbox{N}(0,1) \end{eqnarray*} since \begin{eqnarray*} \frac{\hat{\theta}-\theta}{\hat{se}(\hat{\theta})} = \frac{se(\hat{\theta})}{\hat{se}(\hat{\theta})}\frac{\hat{\theta}-\theta}{se(\hat{\theta})} \end{eqnarray*} and by the consistency assumption \begin{eqnarray*} \frac{se(\hat{\theta})}{\hat{se}(\hat{\theta})} \overset{P}{\rightarrow} 1. \end{eqnarray*}

user277126
  • 1,136
  • 3
  • 9
  • Consistency is one condition, but I guess we need also the existence of the variance. For instance, the sample median of a Cauchy distribution is consistent but has no defined standard error. – Sextus Empiricus Jan 11 '22 at 06:16
  • And is consistency equal to approaching a normal distribution? – Sextus Empiricus Jan 11 '22 at 06:23
  • It is assumed that the first equation he writes holds, namely $\frac{\hat{\theta}-\theta}{se(\hat{\theta})} \rightarrow \mbox{N}(0,1)$. Then I don't see why one needs any other assumption except the consistency assumption that I have written. See my last equation for a definition of consistency – user277126 Jan 11 '22 at 06:27
  • Adjustment of my comment... The sample median of a Cauchy distribution actually will have a variance. But it won't be the case for a distribution with hyper heavy tails. – Sextus Empiricus Jan 11 '22 at 06:42
  • Ah, I see. You answer the question about the condition when the second equation holds conditional on the first being true. I thought that the question was about the 'most of cases' and how those are specified. – Sextus Empiricus Jan 11 '22 at 07:24
  • "But it won't be the case for a distribution with hyper heavy tails." -- I don't see that this is typically the case; asymptotically the variance of the median relates (as inverse-square) to the height of the density of the median. Very heavy tails, by contrast, would tend to have little impact on the median. What you would look out for in particular is vanishing density around the median (regardless of the tail behavior) – Glen_b Jan 11 '22 at 12:11
  • @Glen Certainly it's not typical. It seems to me, though, that the variance of the median of any finite sample from a distribution with survival function on the order of $x^{-1/x}$ would be infinite. – whuber Jan 11 '22 at 21:05
  • Hi @whuber. Just to be clear - when you say "on the order of $x^{−1/x}$", you mean something like $\lim_{x\to\infty} S(x)/(x^{-1/x})$ is $O(1)$? – Glen_b Jan 12 '22 at 01:01
  • @Glen_b Yes, that's right. The idea is that the distribution function of the median of a sample of $2n+1$ is proportional to $S(x)^n S^\prime(x) (1-S(x))^n.$ For large $x,$ the contribution to a variance calculation therefore is bounded below by a multiple of $x^2 S(x)^n S^\prime(x)\ \propto\ x^2(S(x)^{n+1})^\prime(x).$ Thus, if the derivative of $S^{n+1}$ can be made to decrease sufficiently slowly, this integral will be infinite for any $n.$ – whuber Jan 12 '22 at 01:10
  • $x^{-1/x}$ is increasing beyond $x=e$, so its not a valid survival function. I presume there's a simple typo or something. Do you mean $x^{1/x}$ instead? – Glen_b Jan 12 '22 at 01:12
  • @Glen_b Sorry, I meant that to be the distribution function. Take $S = 1 - x^{-1/x}.$ Generally, consider functions that behave like $x^{-p(x)}$ where $p$ shrinks down to $0.$ I suppose something like $x^{1/x}-1$ would work, too. – whuber Jan 12 '22 at 01:19
  • I buy the general argument, in any case. I presume that's the gist of what you were getting at @SextusEmpiricus ? – Glen_b Jan 12 '22 at 01:55
  • @Glen_b here's a question about it https://stats.stackexchange.com/questions/469236/how-do-we-call-a-more-extreme-case-of-fat-tails-than-a-power-law It is not a very typical example. But maybe there are more common ways how an estimator can be consistent but not approach a normal distribution (I believe that it must be at least a non-linear estimatorsuch that CLT does not apply). Sidenote: by now I see that this does not relate to user277126's answer who assumed that the first equation holds. – Sextus Empiricus Jan 12 '22 at 07:31