4

I know that null hypothesis used by ANOVA is that means in all the groups are the same and, if p-value is small, we reject the null hypothesis which means that we believe that not all the means are the same (that is, there are groups with different means).

However, it is not clear to me if we also assume within the null hypothesis that standard deviation in all the groups is the same. What about higher momenta (like skewness and kurtosis), do we assume them to be equal? Do we assume, within the null hypothesis, that distributions of all the groups are the same? Do we assume that the distributions are normal?

Roman
  • 1,013
  • 2
  • 23
  • 38

4 Answers4

5

ANOVA assumes all group distributions under consideration to be normal with the same variance. Consequently, the only way they can differ is in their means.

However, you allude to the fact that the test could pick up on other differences, and if you want to use ANOVA to test something else, it is common in statistics to use a surrogate test. The best example I know is the Wilcoxon Mann-Whitney U test, which only tests mean equality under strict assumptions, but it is a decent test of mean equality (good power, not too many false positives) even when those assumptions are not the case.

You could do a simulation study to see how well ANOVA detects differences in, say, standard deviation when the means are all equal. I would expect such a test to have awful power barely (if at all) above the $\alpha$-level of the test, which would explain why we use ANOVA to detect mean differences, but perhaps someone could write a simulation that surprises me.

Dave
  • 28,473
  • 4
  • 52
  • 104
2

I think, "assume within the null hypothesis" is not a valid statement. We do not assume nothing in null hypothesis. We check if hull hypothesis can be rejected. And to be able to check it, we make some assumptions. These assumptions help us to develop formulas for test statistic and p-value.

So:

if we also assume that standard deviation in all the groups is the same

Yes, we assume this. It means that all the formulas used in testing procedure are valid only if standard deviation in all the groups is the same. If it is not true we should use some other test (Welch ANOVA probably) that do not assumes it.

What about higher momenta (like skewness and kurtosis), do we assume them to be equal?

No.

Do we assume that the distributions are normal?

Yes. Just like with SDs. It means that all the formulas used in testing procedure are valid only if distributions are normal. If it is not true we should use some other test (Kruskal-Wallis ANOVA probably) that do not assumes it.

Do we assume that distributions of all the groups are the same?

If we assumed it, we wouldn't need any test. Because, if all distributions are the same, all means are the same too. In case you meant "Do we assume that distributions of all the groups are from the same family (like all are normals or all are exponentials)?", look at previous paragraph.

Łukasz Deryło
  • 3,735
  • 1
  • 10
  • 26
  • When I say "do we assume this or this in null hypothesis" I actually ask "what is our null hypothesis". From you answer I understood that our null hypothesis is that all the observations come from one normal distribution. – Roman Dec 29 '21 at 14:19
  • In other words, in order to accept or reject a (null) hypothesis we need to know what it is (what is our hypothesis) or, in other words, what it assumes. – Roman Dec 29 '21 at 14:20
  • Assumptions are statments that we know (or we believe) that are true. Hypothesis is a statment that we wish to validate. Here, assuptions are: distribution is normal with the same SD in each group. Hypothesis is: means are also the same. – Łukasz Deryło Dec 30 '21 at 07:40
  • So running this test we say: "OK, we know/believe that normal with the same SD in each group, let's check if means are the same too". – Łukasz Deryło Dec 30 '21 at 07:40
  • For me "assumption" and "hypothesis" mean the same. We assume that all the groups have the same normal distribution and then we prove of disprove this assumption. But it is more a "linguistic question" which does not really matter to me. – Roman Dec 30 '21 at 10:00
  • I am not sure that we use at some point the model in which each group has the same SD and different mean. I guess that we assume that all the groups have the same normal distribution and then we show that the observed data are very unlikely under this assumption. So, we say that "null hypothesis" is NOT the reality but we do not say what the reality IS. – Roman Dec 30 '21 at 10:02
  • You're right: this turned out to be a linguistic question. You may say that for you "assumption" and "hypothesis" mean the same, but they do not. English is nowadays Lingua Franca, but for most people it is not their first language. So, it would be, in my opinion, beneficial to stick to exact definitions of "assumption" and "hypothesis". Otherwise, we can quickly stop understanding each other. – Łukasz Deryło Dec 30 '21 at 10:23
  • Just a detail, skewness and kurtosis are fixed in the normal distribution, skewness is always 0 and kurtosis is always 3, so you are implicitly making assumptions about higher moments when choosing the normal distribution. – BelwarDissengulp Dec 30 '21 at 11:26
0

The pooled 2-sample t test is equivalent to a one-way ANOVA with two levels of the factor.

Suppose one level has 10 replications from $\mathsf{Norm}(\mu_1=1, \sigma_1=10)$ and the other level has 100 replications from $\mathsf{Norm}(\mu_2=1,\sigma_2=1).$ In the example below, the pooled t test strongly rejects the null hypothesis (nominally that $\mu_1=\mu_2)$ with P-value $0.001.$ So the t test must be 'unofficially noticing' the large difference in variances. [Using R.]

set.seed(2021)
x1 = rnorm(10,1,10);  x2 = rnorm(100,1,1)
t.test(x1,x2, var.eq=T)$p.val
[1] 0.0009002001

Here is the same test in ANOVA format, showing the same P-value as above.

x = c(x1, x2)
g = as.factor(rep(1:2, c(10,100)))
anova(lm(x ~ g))

Analysis of Variance Table

Response: x
           Df Sum Sq Mean Sq F value    Pr(>F)    
g           1  98.02  98.018  11.661 0.0009002 ***
Residuals 108 907.84   8.406                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

A brief simulation in R shows that the result above is not a one-time fluke. The 'power' of the pooled 2-sample t test, with similar data, is around 50%, (I'm not sure whether @Dave will find it surprising.)

set.seed(1229)
pv = replicate(10^5, t.test(rnorm(10,1,10),
      rnorm(100,1,1), var.eq=T)$p.val)
mean(pv <= .05)  
[1] 0.53772   # 'power' of test at 5% sig level

The unbalanced pooled t test is notorious for rejecting with equal means and unequal variances, when the sample with the larger variance is much smaller. The Welch 2-sample t test is designed to mitigate this kind of bad behavior, rejecting at the 5% level about 5% of the time when means are equal, even when variances are not.

set.seed(1229)
pv = replicate(10^5, t.test(rnorm(10,1,10),
      rnorm(100,1,1))$p.val)
mean(pv <= .05)
[1] 0.0493

The bad behavior continues for unbalanced one-way ANOVAs, when means are equal and one level has much larger variance along with small sample size. One instance with three levels is shown below; the hull hypothesis is rejected, even though the three population means are equal.

set.seed(2021)
x1 = rnorm(10, 1, 10)
x2 = rnorm(100, 1, 1)
x3 = rnorm(100, 1, 1)
x = c(x1,x2,x3)
g = as.factor(rep(1:3, c(10,100,100)))
anova(lm(x~g))

Analysis of Variance Table

Response: x
           Df  Sum Sq Mean Sq F value    Pr(>F)    
g           2   98.15  49.074  10.027 6.975e-05  ***
Residuals 207 1013.08   4.894                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, the procedure oneway.test in R does not require equal variances, using a Welch-like reduced denominator degrees of freedom to the extent that sample variances are unequal. This test does not find significant differences among the three means.

oneway.test(x~g)

        One-way analysis of means 
        (not assuming equal variances)

data:  x and g
F = 1.9742, num df = 2.000, denom df = 22.981, p-value = 0.1617
BruceET
  • 47,896
  • 2
  • 28
  • 76
0

Assumptions are hypothetical but they are not part of the null hypothesis.

When I say "do we assume this or this in null hypothesis" I actually ask "what is our null hypothesis". From you answer I understood that our null hypothesis is that all the observations come from one normal distribution.

If the null hypothesis and also the additional hypotheses/assumptions are true then indeed the observations come from one single normal distribution (an additional assumption is that each observation is independent).

However, that state 'all observations come from one single normal distribution' is not the hypothesis that is being tested with ANOVA.

The null hypothesis refers to the hypothesis that a certain effect is null/absent. The null hypothesis is not about any additional assumptions.

What happens if the assumptions are false

If the additional assumptions of ANOVA, like normal distributed with equal variance, are wrong then the test still works to some extent.

if p-value is small, we reject the null hypothesis which means that we believe that not all the means are the same (that is, there are groups with different means).

The p-value is an indirect measure of the effect that the means are not the same. More directly we measure the F-statistic, the ratio of variance.

The F statistic remains a measure for the difference between the means of groups. The larger the F-statistic the more different the means of the groups. What the p-value does is provide a statistical interpretation to the observed effect (the magnitude of the F-statistic) in terms of statistical significance.

The only problem with any false assumptions is that this statistical inference is wrong. Based on the assumptions we can compute a sampling distribution for the F statistic (it follows a F distribution if the null hypothesis is correct and a non-central F distribution if the null hypothesis is not correct). That assumption about the sampling distribution would now be wrong and computed p-values will be wrong.

What remains is the situation that the further the actual situation is from the null hypothesis the more likely it is to observe a larger F statistic, and the F statistic remains a measure for the difference of the means.

If the distributions are extremely different from a normal distribution and/or if the variance is not the same among the different populations then one could still estimate a distribution of the F-statistic based on bootstrapping (use the observed values to estimate the sampling distribution of the F-statistic). Or alternatively, the variance of the groups is made more alike by using a transformation of the data.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161