16

I'm trying to figure out exactly what the difference is between $t$-tests and $z$-tests.

As far as I can tell, for both classes of tests one uses the same test statistic, something of the form

$$\frac{\hat{b} - C}{\widehat{\operatorname{se}}(\hat{b})}$$

where $\hat{b}$ is some sample statistic, $C$ is some reference (location) constant (which depends on the particulars of the test), and $\widehat{\operatorname{se}}(\hat{b})$ is the standard error of $\hat{b}$.

The only difference, then, between these two classes of tests is that in the case of $t$-tests, the test statistic above follows a $t$-distribution (for some sample-determined degrees-of-freedom $d$), whereas in the case of $z$-tests, the same test statistic follows a standard normal distribution $\mathcal{N}(0, 1)$. (This in turn suggests that the choice of a $z$-test or a $t$-test is governed by whether or not the sample is large enough.)

Is this correct?

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
kjo
  • 1,817
  • 1
  • 16
  • 24
  • 5
    There is also [this post](http://stats.stackexchange.com/questions/56066/wald-test-in-regression-ols-and-glms-t-vs-z-distribution) which is quite similar to your question but deals with that in the framework of regression. Maybe you'll find some useful information there too. – COOLSerdash Jun 09 '13 at 19:55

1 Answers1

26

The names "$t$-test" and "$z$-test" are typically used to refer to the special case when $X$ is normal $\mbox{N}(\mu,\sigma^2)$, $\hat{b}=\bar{x}$ and $C=\mu_{0}$. You can however of course construct tests of "$t$-test type" in other settings as well (bootstrap comes to mind), using the same type of reasoning.

Either way, the difference is in the $\mbox{s.e.}(\hat{b})$ part:

  • In a $z$-test, the standard deviation of $\hat{b}$ is assumed to be known without error. In the special case mentioned above, this means that $\mbox{s.e.}(\bar{x})=\sigma/\sqrt{n}$.
  • In a $t$-test, it is estimated using the data. In the special case mentioned above, this means that $\mbox{s.e.}(\bar{x})=\hat{\sigma}/\sqrt{n}$, where $\hat{\sigma}=\sqrt{\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2}$ is an estimator of $\sigma$.

The choice between a $t$-test and a $z$-test, therefore, depends on whether or not $\sigma$ is known prior to collecting the data.

The reason that the distribution of the two statistics differ is that the $t$-statistic contains more unknowns. This causes it to be more variable, so that its distribution has heavier tails. As the sample size $n$ grows, the estimator $\hat{\sigma}$ comes very close to the true $\sigma$, so that $\sigma$ essentially is known. So when the sample size is large, the $\mbox{N}(0,1)$ quantiles can be used also for the $t$-test.

L.V.Rao
  • 1,909
  • 15
  • 23
MånsT
  • 10,213
  • 1
  • 46
  • 65