Unpaired t-test: large samples, unequal variances, unequal sample sizes

Question

I a have two independent samples one with more than 700 observations and the second with more than 500 observations. I firstly use Cramér-von Mises test to test normality just to check how non-normal the distribution of the datasets is. It returned me p-value equal to 0. Thus, I have to rely on CLT (hope that I have enough observations - or should I rather somehow "test" that?). I then used Conover test to test equality of variances and it again returned me p-value equal to almost zero. Therefore, I want to compute the unequal variance t-test (Welch test), but I am wondering whether it was better to used standard Student's t-test because as I understand that due to the Lindenberg CLT, we can disregard the unequality of variances. So, generally, which test should I use? And was the above-mentioned testing of data reasonable?

A goodness of fit test (C-vM or otherwise) doesn't tell you how non-normal a distribution is -- a p-value is not an effect size. Further, how much impact a given amount of non-normality has on a t-test decreases with sample size, while the probability you'll reject normality at a given amount of non-normality *increases* with sample size ... so you become most likely to reject just when it matters least. Further, choosing a test on the basis of a formal test of assumptions affects the properties of the second test (sometimes badly); a number of authors recommend against the practice. — Glen_b, Apr 18 '15 at 06:06
Thanks, @Glen_b. I heard that it is better to straightly use Welch test unless we have strong reasons that the variances are equal. So is that what you would recommend? And do you think that my sample sizes are large enough to rely on CLT even in case of "strongly" non-normal distributions (I know that C-vM did not tell me that it is "strongly" non-normal). And is it true that due to the Lindenberg CLT, it is meaningless to test variances? — virusdotcom, Apr 18 '15 at 06:18
See [this answer](http://stats.stackexchange.com/questions/97098/practically-speaking-how-do-people-handle-anova-when-the-data-doesnt-quite-mee/97120#97120) and [this answer](http://stats.stackexchange.com/questions/100934/does-testing-for-assumptions-affect-type-i-error/100941#100941) and my first comment [here](http://stats.stackexchange.com/questions/125738/how-can-you-test-homogeneity-of-variance-of-two-groups-with-different-sample-siz/125778#125778) and more broadly, ... (ctd) — Glen_b, Apr 18 '15 at 06:41
(ctd)... [this](http://stats.stackexchange.com/questions/121852/how-to-choose-between-t-test-or-non-parametric-test-e-g-wilcoxon-in-small-sampl/123389#123389) (there are many more). I can't guess whether your data are such that you can reasonably treat them as normal or not, since I have no basis for considering the type nor extent of the non-normality at hand -- but for moderate amounts of skewness or heavy-tailedness or discreteness you should be just fine. It could be much more skew than exponential without problems but it's easy to find cases where a much larger n isn't enough. — Glen_b, Apr 18 '15 at 06:44
Thanks, @Glen_b. I went through all the texts you posted and my conclusion from that is: I will not check for normality (maybe only at the end of the work, I will just provide the p-value of C-vM test) and assume that CLT will suffice for normality (1st sample: skew=4.6, kurt=34; 2nd sample: skew=2.2, kurt=13.2). Then I will directly use Welch test (argumenting that it is probable that variances are unequal if only for the reason that sample sizes are not equal) and also disregarding the Lindenberg CLT (for un/equality of variances). Is that a good approach? — virusdotcom, Apr 18 '15 at 08:27
That's pretty heavy-tailed/skew ... but with 500 observations, there's very unlikely to be much of an issue with the null distribution (which is to say, your significance level should be quite close to what you think it is). However, you *might* have an issue with lower power than some alternative choices of analysis. An alternative might be to look at a GLM (perhaps a gamma or an inverse Gaussian GLM) to do the comparison. There are other possibilities depending on what the inference you're focuses on. — Glen_b, Apr 18 '15 at 08:47
@Glen_b, so would you disregard Lindenberg CLT and not mention it at all? And would you somewhere in the text report the p-value from Conover test and from C-vM test? And the GLM should be an alternative for what? Sorry for misunderstanding. Thanks a lot for your time. — virusdotcom, Apr 18 '15 at 08:56
(1) CLT applies when $n\to\infty$. It doesn't imply than a mean of 500 terms is normal, and it doesn't imply that a Welch statistic has a t-distribution. If you want to mention a result, possibly the Berry-Esseen theorem might be more relevant (but it still won't necessarily tell you about the distribution of the Welch statistic, nor about power properties in particular). (2) What did you understand the first two sentences of my previous comment to be saying? (everything before "An alternative... ") -- I may have been unclear. — Glen_b, Apr 18 '15 at 09:01
To be specific, I was trying to suggest that the Welch t-test *should be fine*, unless there's some feature of the distributions I can't guess at from what you've told me so far; the only issue might be that higher power might be available using some other procedures (but power may not be an issue anyway). So then I suggested something that might well have better power while still being inference about means (but there's not really enough information to be totally sure). Oh, I missed a part earlier -- I would *not* report p-values from tests of assumptions; I might show a qq-plot of log-data. — Glen_b, Apr 18 '15 at 09:12
OK. I undestand that it is not ensured that the distribution is normal even if the sample size is >500. But I need to have arguments for what I am doing so I need support the assumption that CLT will be OK with such a sample size (doing some additional analysis or...). Now, I hope that I also understand that you proposed GLM as an alternative for Welch t-test, because it could possibly have higher power that Welch test. Again, if I use Welch t-test, I will have to gather as much a possible arguments in favour of doing that. — virusdotcom, Apr 18 '15 at 09:21
You could say something like "when taking the average of so many values, the distribution generally tends to be very close to normal" (and indeed, simulation or bootstrapping could support that); the Berry-Esseen theorem gives explicit bounds on how far from the standard normal the cdf of the standardized mean can be in terms of $E(|Z_i|^3)$ where $Z_i$ is a standardized $X_i$ (specifically, the cdfs will be less than $0.4748E(|Z_i|^3)/\sqrt{n}$ apart). [Of course you can only estimate that expectation, you don't know it.] — Glen_b, Apr 18 '15 at 09:33
@Glen_b, thanks a lot! I think that at this time you provided me sufficient info so that I am able to do what I want now. I would be glad to accept your comments as answer since you really helped me! :) — virusdotcom, Apr 22 '15 at 11:19

Unpaired t-test: large samples, unequal variances, unequal sample sizes

0 Answers0