2

\begin{array} {|r|r|}\hline A & B & C \\ \hline 24 & 21 & 16 \\ \hline 18 & 26 & 22 \\ \hline 27 & 32 & 19 \\ \hline 28 & 25 & 17 \\ \hline \end{array}

At the significance level of $\alpha = 5\%$, compare the average number of tomatoes produced when these types of fertilizer are used.

I'm using ANOVA test for this kind of problem. However, I am stuck in checking the model, especially the same variance between populations. I have two courses of action. Firstly, I will make variance tests between each pair of population. Nevertheless, this measure violates the nature of ANOVA, that is, massive testing. To be more specific, if we test the variance of each pair of population, it is time-consuming, inefficient and it is better to use t-test. Secondly, I just calculate the estimated standard deviation between those populations and decide empirically whether it is acceptable or not. Is there any customary criteria to make rough estimation or more advanced techniques to test the equality of variance between populations?

1 Answers1

2

Data

a = c(24,18,27,23)
b = c(21,26,32,25)
c = c(16,22,19,17)

Not enough data for a formal test of equal variances, but I agree that there may be some doubt. So I would use oneway.test in R, which does not assume equal variances. (The adjustment for possibly unequal variances is similar to the adjustment in a Welch 2-sample t test, used as a substitute for a pooled 2-sample t test.)

sd(a); sd(b); sd(c)
[1] 3.741657
[1] 4.546061
[1] 2.645751

The null hypothesis that all three population means are equal is not rejected at the 5% level of significance.

x = c(a,b,c)
g = rep(1:3, each=4)
oneway.test(x ~ g)

        One-way analysis of means (not assuming equal variances)

data:  x and g
F = 4.2711, num df = 2.0000, denom df = 5.6994, p-value = 0.07355

The standard ANOVA is also not quite significant at the 5% level.

Notes: (1) You must declare as.factor(g), otherwise you get inappropriate output for a regression. (2) oneway.test has DF(Resid)$=5.7\ne 9,$ so some allowance has been made for unequal sample variances.

anova(lm(x~as.factor(g)))
Analysis of Variance Table

Response: x
             Df Sum Sq Mean Sq F value  Pr(>F)  
as.factor(g)  2    114  57.000   4.104 0.05411 .
Residuals     9    125  13.889                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Thus, no ad hoc tests are appropriate. [For the record, just looking at B and C, and ignoring the ANOVA, a Welch two-sample t test would (barely) find significance with P-value 0.037. But this would be P-hacking and risks false discovery.]

There is not enough data formally to test residuals for normality, but a normal probability plot of residuals raises no alarms.

r = lm(x~as.factor(g))$resid
qqnorm(r);  qqline(r, col="green2")

enter image description here

BruceET
  • 47,896
  • 2
  • 28
  • 76