1

can I do my statistics work based on the central limit theorem? I need to perform a t-test, ANOVA and multiple regression. my outcome variable is highly not normally distributed (Highly positively skewed) and my sample size N=115. I'd like to keep the non-parametric tests as a last option for me.

  • Please search for the *extensive* discussions about this on the site. Why do you want nonparametric to not be the *first* option? Why are results from $n \rightarrow \infty$ of interest to you? – Frank Harrell Mar 07 '14 at 22:27
  • 1
    Indeed nonparametric and semiparametric statistics are of interest to most statistician. Though theoretical statements, like normality and equal variance, for t-test and ANOVA promise an upper hand over nonparametric test. In practice parametric statistic faces a lot of problems. – Chamberlain Mbah Mar 07 '14 at 23:05
  • I see, I already used a lot of non-parametric tests in my work. but I need to perform multiple regression and that is why the assumption of normality made a problem for me. so can I consider the central limit theorem to assume normality? – Mahmoud Ismael Mar 07 '14 at 23:18
  • 1
    Why don't you perform the regression and *see* just how much the residuals depart from normality? Then you can provide us much more specific and focused information to help you choose appropriate procedures. – whuber Mar 07 '14 at 23:45
  • PLEASE INDICATE WHETHER YOU HAVE WORKED OUT THE FREQUENCY POLYGON OR YOU HAVE CONTINUOUS DATA? Moreover specify that you have outcome values in terms negative or positive or both. –  Mar 09 '14 at 09:59
  • @subhashc.davar Whether data are positive or negative is irrelevant to the applicability of the CLT or the choice between nonparametric and other procedures. – Nick Cox Mar 09 '14 at 10:49

1 Answers1

3

(1) The CLT is a result in the limit as $n\to\infty$. There's no particular n that's certain to be large enough. e.g. see here which gives a method which works for constructing cases which require larger sample sizes than any $n$ you can nominate.

(2) the central limit theorem on its own is not enough. The statistics you mention rely on a ratio for which the CLT would only help with the numerator, and so you need something to hold for the denominator. The distributions also rely on independence of the numerator and denominator.

If the assumptions are reasonable, you might want to consider GLMs, perhaps (since with suitable software, regression via GLMs is almost as convenient as ordinary regression), though there are other alternatives.

There are various other nonparametric, parametric and robust alternatives.

Glen_b
  • 257,508
  • 32
  • 553
  • 939