I'm looking to test for equality of means across different sample sizes of data, but know that the data is not normally distributed and heteroscedastic. Can anyone suggest anything?
1 Answers
Depending on how many samples you're testing, the relatively obvious options would be the mann-whitney-u-test (for two samples) and kruskal-wallis procedure (for more than two). These don't assume normal distributions, but they do assume identical distributions, so heteroscedasticity may still be a problem for them. Also, they don't test equality of means per se, but rather the Hodges–Lehmann estimate. This may be sufficient for your purposes, and may even be equivalent under certain conditions or strict assumptions. For more info, see "Do we need to report the median or the Mean when using a Kruskal-Wallis test?" and "Difference Between ANOVA and Kruskal-Wallis test".
You might also consider permutation testing, but these assume exchangeability, and thus equal variances too. In noting this, Wikipedia proposes bootstrap tests:
Good (2005) explains the difference between permutation tests and bootstrap tests the following way: "Permutations test hypotheses concerning distributions; bootstraps test hypotheses concerning parameters. As a result, the bootstrap entails less-stringent assumptions." Of course, bootstrap tests are not exact.
The bootstrapping page adds:
Although for most problems it is impossible to know the true confidence interval, bootstrap is asymptotically more accurate than the standard intervals obtained using sample variance and assumptions of normality (DiCiccio & Efron, 1996).
To handle heteroscedasticity via bootstrap testing, you may want to check out wild bootstrap (Wu, 1986; Mammen, 1993; Davidson & Flachaire, 2008). It handles heteroscedastic samples well, though mainly in the context of regression.
References
- Davidson, R., & Flachaire, E. (2008). The wild bootstrap, tamed at last. Journal of Econometrics, 146(1), 162–169. Retrieved from http://eprints.lse.ac.uk/6560/1/The_Wild_Bootstrap,_Tamed_at_Last.pdf.
- DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, 11(3), 189–212. Retrieved from http://staff.ustc.edu.cn/~zwp/teach/Stat-Comp/Efron_Bootstrap_CIs.pdf.
- Good, P. (2005). Permutation, parametric and bootstrap tests of hypotheses (3rd ed.). Springer.
- Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensional linear models. The Annals of Statistics, 21(1), 255–285.
- Wu, C. F. J. (1986). Jackknife, bootstrap and other resampling methods in regression analysis. the Annals of Statistics, 14(4), 1261–1295. Retrieved from http://business.clemson.edu/Economic/faculty/wilson/courses/bcn/papers/14_1261-1295.pdf.

- 11,558
- 5
- 47
- 105
-
-
1[tag:[tag:_tag_content_here_]] You can do it in comments too, but (as you see $\leftarrow$ here) this just produces a hyperlink, so it's not as cool. I find the in-answer tags helpful for providing easy access to the wiki excerpts, but I'm a little frustrated by the disruption of line spacing... – Nick Stauner Mar 04 '14 at 00:24