1

I’ve been digging into the internet trying to find an answer for this question. Notwithstanding, all that I’ve found were empty statements like: “z-tests depends on the population to be normally distributed” or “with t-tests you don’t have to concern if the population is normally distributed or not”. No proof was given, though.

After all, do z and t tests assume that the population is normally distributed? How do you prove it?

Ps: Just to be clear, I'm not trying to prove anything. I just couldn't find any deep discussion or proof of it in any book or paper. Feel free to suggest papers or books in which this subject is discussed, I'll be glad to read them all.

2-D
  • 49
  • 5
  • 1
    Any source that suggests that you don't have worry about distributional assumptions when using a *t* test is probably not giving very good advice. There can be a more nuanced discussion about in what cases a *t* test will be robust to deviations from normality, but these considerations shouldn't be dismissed summarily. – Sal Mangiafico Dec 01 '19 at 14:59
  • I do agree with you. It doesn't seem reasonable to me to generalize it, mainly when N is small. Anyway, once N is big and the samples are randomly taken, I see no reason why these tests would have to assume normality, since CLT guarantees that the distribution of the sums of independent random variables tends to a normal curve. – 2-D Dec 01 '19 at 15:18
  • The *derivation* of the $t$-statistic relies on the normality of the populations. Three things need to be true: i) The numerator has a normal distribution, ii) $(n-1)S^2/\sigma^2$ has a $\chi^2(n-1)$ distribution and iii) $\bar{x}$ and $S^2$ are independent. Now iii) is only true in a normal distribution (it's even a characterization of it). Empirically, the $t$-tests often works fine even when the populations aren't exactly normal, but the *derivation* certainly relies on it. See for example Hogg et al. "Introduction to mathematical statistics". – COOLSerdash Dec 01 '19 at 15:29
  • 2
    See the answer by @whuber [here](https://stats.stackexchange.com/questions/438060/does-this-code-demonstrate-the-central-limit-theorem). Note that depending on the distribution, it can take hundreds of observations for the distribution of the means to approximate a normal distribution. And that this effect is not seen for the Cauchy distribution. – Sal Mangiafico Dec 01 '19 at 15:30
  • Thank for indicating the book and for the answer @COOLSerdash, I'm going to take a look at it right now. – 2-D Dec 01 '19 at 15:36
  • @2-D You're welcome. To be more specific: In my 7th edition, it's section 3.6.3, Theorem 3.6.1 on page 193. – COOLSerdash Dec 01 '19 at 15:38
  • @SalMangiafico, this comment that you've suggested really empirically prove that CLT doesn't hold for all population distributions, I'm really excited to have found it. I hope to be lucky and came across a rigorous explanation of it one day. Thank you. – 2-D Dec 01 '19 at 15:48
  • This is a good point. Out of curiosity, would you recommend any book that approaches cases like this? – 2-D Dec 01 '19 at 16:16

3 Answers3

2

Yes, both tests are designed using assumption that the underlying distribution is normal. It may not be obvious just from looking at the test statistics, but the calculations that go into calculating p-values are heavily dependent on normality.

Look for Mann-Whitney test, as a nonparametric alternative.

Roger Vadim
  • 1,481
  • 6
  • 17
  • How do you prove it? I can’t see any step in the derivation of it that assumes normality. – 2-D Dec 01 '19 at 12:42
  • What 'derivation' have you seen? – Eckhard Dec 01 '19 at 14:38
  • The one in the topic "Union Intersection and Intersection Union Tests" on Casella and Berger's book. I can't see why these tests would have to assume normality of the population even when N is big. Doesn't CLT guarantee that "...their properly normalized sum tends toward a normal distribution (informally a "bell curve") even if the original variables themselves are not normally distributed."? – 2-D Dec 01 '19 at 15:12
  • 3
    Strictly speaking, these tests make assumptions only about the sampling distribution of (a) the mean (for the Z test) and (b) the mean and variance (for the t test). Normality of the underlying distribution is a theoretical consequence of either assumption, *but that is a misleading conclusion,* because what matters for their application only is that the sampling distributions be *sufficiently close* to what is assumed to make the p-values reliable. – whuber Dec 01 '19 at 15:57
  • 1
    You've put in much better words what I was trying to point out. I see no reason for one to bluntly state that z and t-tests assume normality of the underlying distribution since in certain circumstances CLT guarantees that the sample means will follow a normal distribution, even if the population doesn't. – 2-D Dec 01 '19 at 16:43
  • 1
    I think you are confusing two things: the t-test and the situations where it can be applied. One can't get distribution of t-statistic without having taken a few Gaussian integrals. You are talking instead about the difference between the population and the sample, which does not necessarily imply using CLT, which does not necessarily imply resorting to t-test. – Roger Vadim Dec 01 '19 at 16:55
1

I think both test assume normality. The only difference is that in Z-test, we assume we know the true standard deviation $\sigma$. However, for t-test, we don't know the true s.d and we use sample standard deviation $\hat{\sigma}=\sqrt{\frac{\sum_{i=1}^{n}(\bar{\mu}-x_i)^2}{n-1}}$.

In Z-test and t-test, we assume $x_i$ follows $N(\mu,\sigma^2)$. Then $\sum_{i=1}^{n}(\frac{\bar{\mu}-x_i}{\sigma})^2$ follows $\chi^2_{n-1}$-distribution. Then $\frac{\bar{\mu}-\mu}{{\sigma}}$ follows $N(0,1)$ and $(n-1)\hat{\sigma}^2/\sigma^2$ follows $\chi^2_{n-1}$. Hence,the t-statistics=$\frac{\bar{\mu}-\mu}{\hat{\sigma}}=(\frac{\bar{\mu}-\mu}{{\sigma}})/\sqrt{\hat{\sigma}^2/\sigma^2}$ follows $t_{n-1}$ since $\frac{X}{\sqrt{Y/K}}$ follows $t_K$ given that $X$ follows $N(0,1)$ and $Y$ follows $\chi^2_K$ and $X,Y$ are independent.

Ben
  • 111
  • 3
1

Coolserdash's answer is correct.

Ill just add that you see a lot of stuff like "The Z-test requires that your data is normally distributed, etc".

This is not actually correct. The Z test requires that your data generating process comes from a probability distribution that has finite variance.

If that is the case, then you can show that the CLT can be used to show asymptotic convergence of the Z statistic (i.e. the standardized sample mean) to the normal distribution. Thus the Z statistic is normal although the underlying data itself does not have to be (i.e the data could be poisson, etc)

As coolserdash pointed out, on the other hand, the t-test does require that the data be drawn from a normal distribution as the derivation of the t-distribution requires that the sample mean and variance are independent. It can be shown that this is only true under assumption that the generating process for the underlying data is normal.