I have a dataset that is unbalanced. I am testing how temperature and the size of a carcass affect the development rate of maggots. The duration is the time spent in a particular development stage of the maggot. I found the higher the temperature and the larger the carcass, the faster development (shorter duration).
My response variable is Duration of Eggs
(Eggs for short in coding) and my two factors are Temperature
(4 levels = 15, 20, 25, 30) and Size
(2 levels = small and large). The majority of the sample sizes are 4; however one group is 7.
I intend to examine how Duration of Eggs
varies with Temperature
and Size
.
I want to use ANOVA and after much reading I think two-way unbalanced ANOVA should be used.
I imported my data set (anova.data). One function I have tried is:
anova(lm(Eggs ~ Temperature * Size, anova.data))
This gave me:
Analysis of Variance Table
Response: Eggs
Df Sum Sq Mean Sq F value Pr(>F)
Temperature 1 1828.37 1828.37 71.3971 1.521e-09 ***
Size 1 1.71 1.71 0.0669 0.7977
Temperature:Size 1 1.02 1.02 0.0399 0.8429
Residuals 31 793.86 25.61
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, I am uncertain if this takes into account that it is unbalanced.
After further reading I found the function Anova()
[in car package] can be used to compute two-way ANOVA test for unbalanced designs. Out of the three fundamentally different ways to run an ANOVA in an unbalanced design, I read that the recommended method is the Type-III sums of squares. (Not sure why this is though).
So (after install.packages("car")
), I tried a second function:
library(car)
my_anova <- aov(Eggs ~ Temperature * Size, data = anova.data)
Anova(my_anova, type = "III")
Anova Table (Type III tests)
Response: Eggs
Sum Sq Df F value Pr(>F)
(Intercept) 2875.68 1 112.2941 7.883e-12 ***
Temperature 858.05 1 33.5065 2.243e-06 ***
Size 0.45 1 0.0178 0.8948
Temperature:Size 1.02 1 0.0399 0.8429
Residuals 793.86 31
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, this function gives different values and this second function has an additional Intercept value, which I am not familiar with. Which function is correct to use? Or is there an alternative function?
Also, how do I know if to use Type I, II and III sums of squares? I have done some reading but I am still unsure. I do not know if there is an interaction between Temperature and Size.
This is my dataset:
15°C Small: 43.0, 43.0, 43.0, 43.0
15°C Large: 40.5, 40.5, 40.5, 40.5
20°C Small: 24.0, 24.0, 24.0, 23.5, 23.5, 23.5, 23.5
20°C Large: 24.0, 24.0, 24.0, 24.0
25°C Small: 20.0, 20.0, 20.0, 20.0
25°C Large: 20.0, 20.0, 20.0, 20.0
30°C Small: 20.0, 20.0, 20.0, 20.0
30°C Large: 20.0, 20.0, 20.0, 20.0