1

I know that the sampling distribution of the mean can be assumed to be normal if N>30, but does this have an implication on the "30" itself (the sample data)?

I have three different time series with N=500 (well beyond 30) each and I want to test for equality of their means. Does the normality of the mean of the sampling distribution mean that I can assume the normality of the time series data I have and use a parametric test? For example Welch ANOVA?

janebanane
  • 11
  • 1
  • 4
    No you *can't* assume normality for N>30. See https://stats.stackexchange.com/questions/2541/what-references-should-be-cited-to-support-using-30-as-a-large-enough-sample-siz/2542#2542 – Tim Nov 14 '18 at 21:49
  • "30" is just a rule of a thumb, and hardly anyone knows where it started and what is it based on. Forget about it. – Aksakal Nov 14 '18 at 22:28
  • Most of the answers at the page Tim points to are worth reading -- and other questions on site have answers that make similar points. It's *easy* to construct cases where n=300 or n=3000 (or any other number you like) are not nearly sufficient -- but for which the central limit theorem definitely applies. – Glen_b Nov 15 '18 at 00:24
  • Note also that approximate normality of the mean does *not* imply approximate normality of the data. – Glen_b Nov 15 '18 at 01:44

3 Answers3

3

It is not correct to say that a sample size of 30 or so makes the central limit theorem apply. Take for example a sample from a log-normal distribution with n=50,000. The CLT when used to construct a confidence interval for the unknown mean yields very inaccurate limits.

Use a method that does not assume normality, e.g. a nonparametric test.

But note that none of this applies directly to time series when the multiple observations within a series are correlated.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • It depends on the coefficients in the lognormal; if $\sigma$ is very small a confidence interval for the mean may be pretty reasonable, but if $\sigma$ is not small it may be very far out. – Glen_b Nov 15 '18 at 01:46
  • 2
    The fact that for the CLT to work many assumptions are required - assumptions that we can't know from non-large sample sizes - makes me never want to use it in practice. In the modern era of stats we don't need it anymore. The single biggest assumption the CLT makes that is often violated is the assumption that the standard deviation is the appropriate dispersion measure. For asymmetric distributions it is definitely not. – Frank Harrell Nov 15 '18 at 12:40
0

No, by the central limit theorem the sampling distribution of the mean approaches normality regardless of the form of the parent (with a couple rare exceptions). While you compute only 1 sample mean per group, that point estimate is an exemplar of a family of possible sample means you might compute with infinite resources. That distribution is normal.

HEITZ
  • 1,682
  • 7
  • 15
0

Normality is something we commonly assume when we conduct a hypothesis test or fit a model. A common rule of thumb is that if your sample size is greater than 30 the central limit theorem probably applies and you could use a test that assumes normality. This is a rule of thumb only, and is often violated. If there is ever a magic cut-off number in statistics be wary of it (another common one is for comparing ratio of variances)

You can check the normality assumption of your data with all sorts of tests or graphical procedures (such as qqplots). You should check the assumptions of the test you want to do (anova in this case) and if they are not violated then you can proceed and trust that your p-value etc are actually meaningful.

This question here has a list of the assumptions and an interesting discussion of what normality we are interested in (normality of the residuals or normality of the individual groups). ANOVA assumption normality/normal distribution of residuals

RAND
  • 412
  • 3
  • 11