Difference in means statistical test for non-normal, heteroscedastic data?

Question

Suppose I have a data set with three or more groups. After some exploratory data analysis, I find that

The groups do not come from a normal distribution
the variances within the groups are not equal, and violate the rule of thumb that the largest variance is no more than 4 times the smallest variance.

If I'm interested in a statistical test for the difference in means across these groups, what tests can I use?

From what I understand, ANOVA requires normality (although it tends to be robust against deviations from normality) and homoscedasticity. The Kruskal-Wallis test and the Fisher-Pitman permutation test can deal with the non-normality, but I believe both require homoscedasticity. Using Welch's ANOVA will help with the unequal variances, but it requires normality.

I'm interested in how this community would suggest moving forward with a difference in means analysis.

You could try bootstrapping? See https://stats.stackexchange.com/questions/56971/alternative-to-one-way-anova-unequal-variance Could you show us a plot of the data (or share a link)? — kjetil b halvorsen, Mar 24 '18 at 14:39
If the variances are different you have already shown the groups are different so perhaps you need to specify why you also need to test the means? — mdewey, Mar 24 '18 at 15:15
Clarification: ¿Do you want to say a difference exists and attach a p-value to this claim? or ¿do you want to actually locate differences between specific groups? (Curious as to what the long-run research question is here) — Gregg H, Mar 24 '18 at 16:05
@kjetilbhalvorsen: I didn't have particular data in mind. I was hoping for some general approaches that others have used in this situation or one similar. — stats_curious, Mar 24 '18 at 17:39
@GreggH: I had in mind a p-value type test to refute a claim of equal means, but I'm open to other statistical approaches that communicate a similar idea. — stats_curious, Mar 24 '18 at 17:43
also...one other point of clarification...the question title says "non-normal" but the scenario you describe (bullet 1) says the data is "normal" — Gregg H, Mar 24 '18 at 19:14
A simple thing to do: dummify the category id, regress the variable of interest on these dummies, use robust variance-covariance matrix (e.g. White’s) for Wald’s test, there you go. No normality, no homoskedasticity, no undergrad shit like that that is used purely for psychological comfort. A better thing to do: do an ordinary LS, get the residuals, estimate the residual variance in each group, apply weighted LS with weights reciprocal to these variances. Voilà, you have the **best** linear unbiased estimator (BLUE). You do not even need non-parametric estimation... All relies on asymptotics. — Andreï Kostyrka, Mar 24 '18 at 22:54

Isabella Ghement · Answer 1 · 2018-03-24T16:36:46.783

3

This is an interesting thread so I thought I would add my own thoughts to it. If I understand things correctly, you have a situation where you are interested in testing differences between k population means (where k > 2) when the populations are normal but have different spreads.

As a first resort, you could try applying a variance-stabilizing transformation to the data you collected on the outcome variable (e.g., log transformation if these data are strictly positive). If the transformation works, you can then apply the standard ANOVA to the transformed outcome data (presuming the other assumptions underlying ANOVA still hold). In general, the more complicated the transformation, the more complicated the ensuing interpretation. Also, in some situations, you may be hard-pressed to find an adequate transformation.

A better approach, in my view, would be to use generalized least squares (gls) regression to analyze the untransformed outcome data. In R, the gls approach is implemented via the gls() function in the nlme package:

install.packages("nlme")
require(nlme)

model <- gls(outcome ~ group,
         data = yourdata,
         weights = varIdent(form = ~1|group))

summary(model)

anova(model)

The gls approach will help you relate the outcome variable (expressed on its original scale) to the group variable, while allowing for the possibility that the error variability is different across levels of the group variable. The approach will estimate the error variability in each group for you, under the assumption of normality of the errors within each group. More importantly, the gls approach will provide you with a flexible framework for testing a priori contrasts or performing various types of post-hoc multiple comparisons. Checking model diagnostics is also straightforward.

The bootstrapping approach is also a possibility, though I think it's not as easy to implement as the gls approach, especially when you consider the need to test a priori contrasts or perform post-hoc multiple comparisons.

This question has been asked before and addressed here, for example: Explicitly modelling unequal variances in a linear model to get better conf/pred intervals?.

edited Mar 24 '18 at 16:36

answered Mar 24 '18 at 16:25

Isabella Ghement

18,164
2
22
46

1

Thanks Isabella. Does this approach still require a normal distributions among the groups? – stats_curious Mar 24 '18 at 18:08
Yes, the gls approach requires normal distributions for the values of the outcome variable in each group, but the level of spread of these distributions can depend on the group variable. If the normality assumption is violated, then you would have to resort to gls + bootstrap, I guess. (In your question, you mentioned that you do have normality of the data in each group, hence my recommendation to use gls.) – Isabella Ghement Mar 24 '18 at 19:03
@stats_curious In no way (G)LS assumes normality of error terms! The inference will be asymptotically valid with White’s variance-covariance matrix estimate, since the (G)LS estimators are asymptotically normal, even if the model error comes from an unknown distribution. In case of heteroskedasticity of the most general form, Robinson’s (1987) estimator, `E[XX'/σ²(X)]^(-1) E[XY/σ²(X)]`, will asymptotically beat **any** GLS estimator; in the discrete regression case, like here, the estimator of `σ²(X)` is just `E[ε²|X]` measured for each discrete X. Please correct your answer. – Andreï Kostyrka Mar 24 '18 at 22:46
The White estimator may not perform well in finite samples (so it may need a finite sample correction). It is also biased, so it may require a bias correction. No solution is perfect! – Isabella Ghement Mar 24 '18 at 23:09
1

Isabella, I couldn't find any aspect of @Andreï's comment that is uncivil, especially taking into account that the 500 character limit practically forces people to dispense with the usual niceties and get right to the point. I agree that his request to "correct" the answer may be asking a lot and you're right that it's too much to expect every answer to be "optimal," but one thing this site has going for it is that encourages us to *improve* answers. When everyone cooperates toward that end, it works well. – whuber Mar 24 '18 at 23:16
1

@IsabellaGhement I am very sorry if it seemed too harsh, I am just allergic to distributional assumptions where they are not needed. In no way I wanted to offend you. You wrote: `Yes, the gls approach requires normal distributions for the values of the outcome variable in each group`. But this is just wrong! It does not. Addressing the White’s matrix biasedness—one may use Davidson—McKinnon’s or Cribari-Neto’s finite-sample correction. Asymptotically they are all the same, though. The OP asked for differences in means, but in a more general setting, Mann—Whitney’s test with Bonferroni correcti – Andreï Kostyrka Mar 24 '18 at 23:30
Thank you, @whuber. I am new to the site and get rattled more easily, I guess. I deleted two of my comments after reading your response, but I still think no solution is perfect and the one proposed by the commenter has its own drawbacks. I won't edit my answer in view of this. – Isabella Ghement Mar 24 '18 at 23:30
on for multiple comparisons is the way to go. I guess, instead of criticising your answer, I should have prepared a rigorous separate answer... – Andreï Kostyrka Mar 24 '18 at 23:33
Sorry, did not see that there are 3 or more groups. In this case, Dunn’s test after Kruskal—Wallis’ ANOVA on ranks is the right way for post hoc comparison. – Andreï Kostyrka Mar 24 '18 at 23:37
I think I understand where you are coming from, Andrei. I had in mind the gls() function in R, which uses either maximum likelihood, or restricted maximum likelihood for estimation and most likely relies on normality for subsequent inferences/predictions based on the model. I believe gls() uses the gaussian family by default (if not, how could one derive the likelihood?). – Isabella Ghement Mar 24 '18 at 23:54
Thank you for elaborating on your comments, Andrei (not sure how to get the accent on the i to spell your name correctly). I think your point, in general, is (?) that normality is not needed for obtaining gls estimates. However, I had presumed that the asker was interested in performing tests of hypothses and constructing confidence intervals, in which case normality of the errors in each group would come in handy. – Isabella Ghement Mar 24 '18 at 23:59

Difference in means statistical test for non-normal, heteroscedastic data?

1 Answers1