I'm new to statistics and I'm trying to understand what to do with my data!
I have two factors : tree genotype (10 levels) and soil type (3 levels). For one genotype I have only 2 replicates in soil1 and soil2 but 3 replicates in soil3. For 4 genotypes, I have 3 replicates in soil1 and soil2 but 4 replicates in soil3.
Can I analyse the interaction between the two factors using gls (nlme package) with these data? Or should I remove the fourth replicate to make it balanced? And should I remove the entire genotype missing data? Or just the two soils missing a replicate?
I tried removing nothing and it worked. Can I trust these results? I thought it would'nt work since I only have two replicates for some treatments but it did'nt seem to be a problem...
Here's my code :
> my_model <- gls(variable~Soil_type+Genotype+Soil_type:Genotype, data=my_data, na.action = na.omit)
> shapiro.test(resid(my_model, type = "normalized"))
Shapiro-Wilk normality test
data: resid(my_model, type = "normalized")
W = 0.99199, p-value = 0.8568
> bartlett.test(resid(my_model, type = "normalized") ~ fitted(my_model, type = "normalized"))
Bartlett test of homogeneity of variances
data: resid(my_model, type = "normalized") by fitted(my_model, type = "normalized")
Bartlett's K-squared = 29.118, df = 29, p-value = 0.4589
> anova(my_model)
Denom. DF: 62
numDF F-value p-value
(Intercept) 1 1664.3700 <.0001
Soil_type 2 121.1435 <.0001
Genotype 9 3.9401 0.0005
Soil_type:Genotype 18 1.3449 0.1930