0

I have 7 different groups of playing time data that is not normal.

Lets say its really not normal..like a lot.

The variance is also not equal and fails the bartlett test by a landslide.

Sample size is quite big, around 7000, but different sizes.

Am I better off with the Kruskal-Wallis test that does not need the normal distribution or better off with the Welch-ANOVA that does not need equal variances?

I should add that I am interested in shifts of the mean or median.. the question is if players in group 1 play longer than in group 2.

Is any of the two half way valid? Are there better alternatives?

Here are some pics of the data:

qqplot

hist

from this qqplot and histogram all the groups look kinda similar.

skewness calculations are all way above 1 as are kurtosis and all the test are significantly not normal (although most data probably is with that samplesize)

I have no experience to just look at data and see if its normal enough or whatever.

Glorfindel
  • 700
  • 1
  • 9
  • 18
Jan Veit
  • 1
  • 2
  • 1
    Personally I'd say "probably neither" and be considering a GLM. – Glen_b Nov 08 '14 at 16:11
  • It's hard to say from what you have here. Be aware that w/ such a large sample, tests of normality and homogeneity will reject the null w/ very trivial deviations. In what way are the data non-normal & heterogeneous? Are they skewed, eg? Will a transformation help? BTW, since your data are durations, you could try a survival analysis like the Cox proportional hazards model. Do you have any censoring? Can you provide more information in general about your data & your goals? – gung - Reinstate Monica Nov 08 '14 at 16:15
  • You may also find some food for thought in my answer here: [Alternatives to one-way ANOVA for heteroscedastic data](http://stats.stackexchange.com/a/91881/7290). – gung - Reinstate Monica Nov 08 '14 at 16:17
  • I edited my post with some pics of the data, thanks for your answers! – Jan Veit Nov 08 '14 at 16:40
  • Just my two cents, but perhaps look into [box-cox](http://onlinestatbook.com/2/transformations/box-cox.html) transformations, or permutation testing. – Chris C Nov 08 '14 at 17:04
  • Do you have any censoring? – gung - Reinstate Monica Nov 08 '14 at 18:29
  • If I understand right what that is, yes, I only now the sessions number that is over 30 minutes.. I made a uniform distribution out of it from 30-90 minutes. The rest of the data is created similar. i.e. I know the session number from 10-30 sec and made uniform distribution out of that from 10-30sec. – Jan Veit Nov 09 '14 at 07:34

0 Answers0