The experiment I am working on has the following design:
A B C D E F
B A D E F C
A B E F C D
B A F C D E
- Each Letter represents a different level of the single factor called “system” analyzed in this experiment. The dataset contains eight years and the dependent variable we are analyzing is yield.
A and B can be grouped together, as well as C to F according to their system type. I am aware of the missing randomization between groups AB and CDEF, which was necessary due to regulations, as well of the missing randomization within these two Groups, which has simply not been made, sadly. - I am investigating if there are sigificant differnces in yield between the systes (A-F)
My data looks like this:
> str(data)
'data.frame': 192 obs. of 6 variables:
$ year : Factor w/ 8 levels "2012","2013",..: 1 1 1 1 1 1 1 1 1 1 ...
$ type : Factor w/ 2 levels "org","pest": 1 1 1 1 1 1 1 1 1 1 ...
$ system: Factor w/ 6 levels "dgst_org","cc_pest",..: 3 3 3 3 5 5 5 5 6 6 ...
$ row : Factor w/ 4 levels "row_1","row_2",..: 1 2 3 4 2 3 4 1 3 4 ...
$ column: Factor w/ 6 levels "column_1","column_2",..: 6 5 4 3 6 5 4 3 6 5 ...
$ yield : num 26.2 41.4 43.4 45 40.8 52.3 47.1 47.2 40.1 42.4 ...
> summary(data)
year type system row column yield
2012 :24 org :128 dgst_org :32 row_1:48 column_1:32 Min. : 26.20
2013 :24 pest: 64 cc_pest :32 row_2:48 column_2:32 1st Qu.: 52.30
2014 :24 cc_org :32 row_3:48 column_3:32 Median : 62.95
2015 :24 manure_pest:32 row_4:48 column_4:32 Mean : 73.79
2016 :24 manure_org :32 column_5:32 3rd Qu.:103.83
2017 :24 fmyd_org :32 column_6:32 Max. :127.10
> head(data,20)
year type system row column yield
377 2012 org cc_org row_1 column_6 26.2
378 2012 org cc_org row_2 column_5 41.4
379 2012 org cc_org row_3 column_4 43.4
380 2012 org cc_org row_4 column_3 45.0
417 2012 org manure_org row_2 column_6 40.8
418 2012 org manure_org row_3 column_5 52.3
419 2012 org manure_org row_4 column_4 47.1
420 2012 org manure_org row_1 column_3 47.2
461 2012 org fmyd_org row_3 column_6 40.1
462 2012 org fmyd_org row_4 column_5 42.4
463 2012 org fmyd_org row_1 column_4 39.5
464 2012 org fmyd_org row_2 column_3 35.7
505 2012 org dgst_org row_4 column_6 57.8
506 2012 org dgst_org row_1 column_5 48.8
507 2012 org dgst_org row_2 column_4 52.3
508 2012 org dgst_org row_3 column_3 64.1
537 2013 org cc_org row_1 column_6 41.2
538 2013 org cc_org row_2 column_5 43.3
539 2013 org cc_org row_3 column_4 57.2
540 2013 org cc_org row_4 column_3 51.1
I tried to come up with a proper linear mixed effect model but ran into some Problems because of the poor experiment design.
The yield showed a bimodal distribution, which was as expected an effect of the system type.
I know understand that this is no Problem as long as the residuals of the model are normally distributed, whitch they are
> m1 <- lmer(yield ~ system + (1|year) + (1|year:system) + (1|year:column) + (1|year:row), data = data)
> summary(m1)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: yield ~ system + (1 | year) + (1 | year:system) + (1 | year:column) + (1 | year:row)
Data: data
REML criterion at convergence: 1262.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.2604 -0.4993 0.0596 0.5585 2.3880
Random effects:
Groups Name Variance Std.Dev.
year:column (Intercept) 0.01384 0.1176
year:system (Intercept) 43.85302 6.6222
year:row (Intercept) 2.27887 1.5096
year (Intercept) 22.30702 4.7230
Residual 26.42919 5.1409
Number of obs: 192, groups: year:column, 48; year:system, 48; year:row, 32; year, 8
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 62.981 3.028 27.986 20.801 < 2e-16 ***
systemcc_pest 46.566 3.552 34.309 13.110 6.42e-15 ***
systemcc_org -9.744 3.552 33.574 -2.743 0.00969 **
systemmanure_pest 47.147 3.552 34.309 13.274 4.49e-15 ***
systemmanure_org -8.369 3.552 33.574 -2.356 0.02444 *
systemfmyd_org -10.722 3.552 33.574 -3.019 0.00482 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) systmcc_p systmcc_r systmmnr_p systmmnr_r
systmcc_pst -0.587
systemcc_rg -0.587 0.500
systmmnr_ps -0.587 0.500 0.500
systmmnr_rg -0.587 0.500 0.500 0.500
systmfmyd_r -0.587 0.500 0.500 0.500 0.500
- My first idea was then to separate the whole dataset into two datasets (AB and CDEF) with each one having normally distributed data and checking for significant differences between the system, at first separately and then together.
My lmer model for the group CDEF was:
m1 <- lmer(yield ~ system + (1|year) + (1|year:system) + (1|year:row) + (1|year:column))
I tried to add an additional random effect accounting for the interaction between row and column+(1|row:column)
but got an error message:boundary (singular) fit: see ?isSingular
The Model for the Group AB was:
m2 <- lmer(yield ~ system + (1|year) + (1|year:system) + (1|year:row))
since only the rows where single replicates. I checked with the emmeans package if there are significant differences between the groups and found ones between F, with a higher yield, and CDE with lower yield. No differences were found between system A and B. After that I didn’t know how to continue and compare the two groups.
- My second idea was to add a grouping variable taking account for the system type and creating a model that could compare the whole experiment at once.
The lmer model I came up with was:
m3 <- lmer(yield ~ type + system + (1|year) + (1|year:system) + (1|year:type) + (1|year:row))
again I ran into some Problems, I didn’t know how to properly nest my fixed effects, since they clearly are nested and how to take account for the columns.
As mentioned from Russ Lenth in the comments it does not make sense to split the Population since it’s an effect from the treatment
My Questions therefore are:
Should I divide my dataset and analyze the two system types (AB and CDEF) separately, if so how do I include columns in the AB model and what possibility do I have to compare AB and CDEF afterwards ?Or should I make one model to rule them all and create a new grouping variable for system type and nest them properly and ignore the random effect for column ?
Or do you have any other Idea how this design could be handled ?
New Models
> m1 <- lmer(yield ~ system + (1|year) + (1|year:system) + (1|year:row), data = data)
> summary(m1)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: yield ~ system + (1 | year) + (1 | year:system) + (1 | year:row)
Data: data
REML criterion at convergence: 1262.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.2609 -0.4988 0.0592 0.5590 2.3885
Random effects:
Groups Name Variance Std.Dev.
year:system (Intercept) 43.868 6.623
year:row (Intercept) 2.276 1.509
year (Intercept) 22.305 4.723
Residual 26.442 5.142
Number of obs: 192, groups: year:system, 48; year:row, 32; year, 8
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 62.981 3.028 28.260 20.799 < 2e-16 ***
systemcc_pest 46.566 3.552 35.000 13.108 4.6e-15 ***
systemcc_org -9.744 3.552 35.000 -2.743 0.00954 **
systemmanure_pest 47.147 3.552 35.000 13.272 3.2e-15 ***
systemmanure_org -8.369 3.552 35.000 -2.356 0.02421 *
systemfmyd_org -10.722 3.552 35.000 -3.018 0.00472 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) systmcc_p systmcc_r systmmnr_p systmmnr_r
systmcc_pst -0.587
systemcc_rg -0.587 0.500
systmmnr_ps -0.587 0.500 0.500
systmmnr_rg -0.587 0.500 0.500 0.500
systmfmyd_r -0.587 0.500 0.500 0.500 0.500
> m2 <- lmer(yield ~ system + (1|year) + (1|year:row) + (1|year:column), data = data)
> summary(m2)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: yield ~ system + (1 | year) + (1 | year:row) + (1 | year:column)
Data: data
REML criterion at convergence: 1302.3
Scaled residuals:
Min 1Q Median 3Q Max
-3.0617 -0.5748 0.1023 0.5824 2.7636
Random effects:
Groups Name Variance Std.Dev.
year:column (Intercept) 27.2467 5.2198
year:row (Intercept) 0.2432 0.4932
year (Intercept) 25.0757 5.0076
Residual 38.6421 6.2163
Number of obs: 192, groups: year:column, 48; year:row, 32; year, 8
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 62.981 2.281 12.319 27.616 1.87e-12 ***
systemcc_pest 46.566 2.229 75.612 20.889 < 2e-16 ***
systemcc_org -9.744 1.554 116.002 -6.270 6.39e-09 ***
systemmanure_pest 47.147 2.229 75.612 21.149 < 2e-16 ***
systemmanure_org -8.369 1.554 116.002 -5.385 3.84e-07 ***
systemfmyd_org -10.722 1.554 116.002 -6.899 2.93e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) systmcc_p systmcc_r systmmnr_p systmmnr_r
systmcc_pst -0.405
systemcc_rg -0.341 0.349
systmmnr_ps -0.405 0.757 0.349
systmmnr_rg -0.341 0.349 0.500 0.349
systmfmyd_r -0.341 0.349 0.500 0.349 0.500