1

I have a data set that has 600 observations divided in two groups. I am going to compare the central tendencies (e.g., the means) of these two groups. However, there are violations of classical assumptions present, such as normality and equality of variances.

  • Can I use a parametric approach (specifically, the t-test), since the sample sizes are large (based on the Central Limit theorem), or do I have to use a non-parametric approach?
  • If I should use a non-parametric approach, which test (Mann-Whitney, Median or Kolmogorov-Smirnov) is most appropriate?
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
  • 1
    see also http://stats.stackexchange.com/questions/15664/how-to-test-for-differences-between-two-group-means-when-the-data-is-not-normall – Jeromy Anglim May 16 '12 at 04:22
  • 1
    Whether or not the central limit theorem has effectively "kicked in" by a given sample size depends on the distribution of the data. In most cases that come up, $n = 300$ in each group would be enough. It's mainly unusual examples that "break" the CLT for finite samples. For example, if your data were all Bernoulli trials with success probability $p = 10^{-9}$, your sample of $n=300$ outcomes would almost certainly be all 0s, so the sample means are, of course, not approximately normal - a much larger sample size would be required. – Macro May 16 '12 at 12:35

2 Answers2

1

With 600 observations divided between only two groups (assuming that they're divided fairly equally, not 598 and 2), you most likely have enough data to feel comfortable using the t-test. If the variances differ between the two groups, you would want to use the Welch-Satterthwaite correction for the effective degrees of freedom.

However, you might want to use the Mann-Whitney U-test anyway. The U-test is more powerful than the t-test when the data are not normally distributed. Best of luck with your project.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Thanks for your reply.I thought that Mann-Whitney Test is less powerful than the parametric test!!Which test I have to use when the sample sizes are not fairly equal,for example 200, 400.And the second question is that If I am going to use Mann-whitney,I have to report the values of medians or mean of ranks instead of reporting the values of means? – shervin asgari May 16 '12 at 04:36
  • 200 & 400 is not too unbalanced, I think, especially since you have so much data. However, your power will not be as great as if they were 300 & 300. MW is less powerful than t if the data are normal, but more powerful if they're not. You could always report both medians & means, or whatever you like. – gung - Reinstate Monica May 16 '12 at 14:40
0

I would like to add only one remark to gung's answer which is of course in general correct.

Central Limit Theorem will sooner or later "kick-in" only for distributions that have finite moments. If your data was generated by a process described by a probability distribution with infinite moments (for example power distribution/pareto distribution), then CLT will never work (i.e. regardless of the size of your sample). So if you suspect that your data may not have finite moments, then I would say it is safer to use nonparametric tests (like Mann-Whitney's U test).

sztal
  • 1,009
  • 1
  • 9
  • 14