I was studying about wald statistics in wikipedia and it states:
$Z=\displaystyle\frac{(\hat{\theta}-\theta_0)}{se(\hat\theta)}$ $\rightarrow N(0,1)$ as the sample size $n$, which $\hat\theta$ was evalued in, goes to $\infty$ and $se$ is the standard deviation.
That's the $Z$ distribution under the null hypotesis $H_0: \theta=\theta_0$ vs the alternative hypotesis $H_1: \theta \neq \theta_0$, the wikipedia tells us that the such approach generally holds for the most of cases but I'd like to know if :
$Z=\displaystyle\frac{(\hat\theta-\theta)}{\hat{se}(\hat\theta)} \rightarrow N(0,1)$
holds for "the most of cases", regardless if the wald statistic is $N(0,1)$ with large $n$ or not, if so then when would it hold ? That's what are the "weak assumptions" (as this is supposed to hold in the most of time) which leads $Z$ to be $N(0,1)$? Does that anything to do with wald approach? or would this be called as 'wald anything' just under hypotesis testing?