I'm relatively new to GLMMs and so far only handled relative data. Now I'm trying to model if the abundance of a taxon is affected in the disease state (condition) when considering random effects like repeated sampling (ID), day and cage effects. The abundance data (taxon) is absolute counts of the bacterium derived by qPCR and comparison to a standard curve.
df
ID cage day condition taxon
1 100 4 10 healthy 291137072
2 100 4 23 healthy 49370511
3 100 4 27 disease 0
4 100 4 4 healthy 0
5 101 5 10 healthy 255823819
6 101 5 25 healthy 0
7 101 5 26 disease 0
8 101 5 4 healthy 491783963
9 102 6 10 healthy 827020812
10 102 6 29 healthy 1942357597
11 102 6 31 healthy 353989613
12 102 6 4 healthy 265950135
13 103 1 10 healthy 683672045
14 103 1 20 healthy 248822130
15 103 1 20 healthy 330402708
16 103 1 24 disease 72356773
17 103 1 24 disease 0
18 103 1 25 disease 0
19 103 1 4 healthy 758199011
20 104 2 10 healthy 391127953
21 104 2 14 healthy 121840666
22 104 2 24 healthy 428233074
23 104 2 26 healthy 0
24 104 2 26 disease 0
25 104 2 4 healthy 856582541
26 56 1 10 healthy 328956034
27 56 1 20 healthy 243055521
28 56 1 25 healthy 274206537
29 56 1 31 healthy 327404926
30 56 1 4 healthy 0
31 57 2 10 healthy 250707674
32 57 2 14 healthy 105076869
33 57 2 24 healthy 0
34 57 2 26 healthy 25117434
35 57 2 29 healthy 307745763
36 57 2 30 healthy 35323060
37 57 2 30 disease 26061862
38 57 2 31 healthy 236960027
39 57 2 4 healthy 0
40 57 2 7 healthy 548242526
41 58 3 23 healthy 132782429
42 58 3 27 healthy 53354564
43 58 3 28 healthy 109248499
44 58 3 28 disease 172993809
45 58 3 28 disease 71076639
46 58 3 29 healthy 136523472
47 58 3 31 healthy 107758937
48 58 3 4 healthy 418700327
49 59 4 10 healthy 240420771
50 59 4 23 healthy 177600588
51 59 4 27 healthy 0
52 59 4 31 healthy 71662711
53 59 4 4 healthy 0
54 81 1 10 healthy 235286007
55 81 1 19 healthy 0
56 81 1 20 disease 0
57 81 1 4 healthy 0
58 82 2 14 healthy 162675954
59 82 2 14 disease 0
60 82 2 4 healthy 434068852
61 83 3 10 healthy 0
62 83 3 28 healthy 115583049
63 83 3 31 healthy 0
64 83 3 4 healthy 0
65 84 4 10 healthy 0
66 84 4 17 healthy 0
67 84 4 21 healthy 120404542
68 84 4 4 healthy 0
69 85 5 10 healthy 121380422
70 85 5 26 healthy 398575728
71 85 5 31 healthy 49424593
72 85 5 4 healthy 0
73 86 6 10 healthy 589647622
74 86 6 29 disease 88311287
75 86 6 29 disease 131933744
76 86 6 4 healthy 0
77 87 1 10 healthy 46217816
78 87 1 25 healthy 0
79 87 1 31 healthy 319339186
80 87 1 4 healthy 0
81 90 3 10 healthy 277742799
82 90 3 28 disease 164413227
83 90 3 31 healthy 283547803
84 90 3 4 healthy 1331380313
85 91 4 10 healthy 0
86 91 4 23 healthy 0
87 91 4 27 healthy 476684613
88 91 4 31 healthy 384647198
89 91 4 4 healthy 533681749
90 92 5 10 healthy 1495897788
91 92 5 31 healthy 806317700
92 92 5 4 healthy 1700020953
93 93 6 10 healthy 632004192
94 93 6 29 healthy 708512232
95 93 6 31 healthy 2031512652
96 93 6 4 healthy 596088438
97 98 2 10 healthy 328751027
98 98 2 23 healthy 0
99 98 2 24 healthy 0
100 98 2 25 disease 0
101 98 2 26 disease 0
102 98 2 26 healthy 0
103 98 2 27 disease 0
104 98 2 27 disease 0
105 98 2 27 disease 0
106 98 2 29 disease 78682114
107 98 2 29 disease 0
108 98 2 30 disease 0
109 98 2 30 disease 69581995
110 98 2 4 healthy 1619240477
111 99 3 10 healthy 188962800
112 99 3 28 healthy 481068698
113 99 3 31 healthy 572791136
114 99 3 4 healthy 999854899
min(df$taxon) # 0
max(df$taxon) # 2031512652
The order of magnitude is very large, ranging from 0 counts up to 2 million. When I'm running a GLMM with poisson or negative binomial families, as suggested for count data, I get several warnings:
fit_nb <- lme4::glmer.nb(data = df,
taxon ~ condition +
(1|day) +(1|cage) + (1|ID))
fit_poisson <- lme4::glmer(data = df_for_taxon,
taxon ~ condition +
(1|day) +(1|cage) + (1|ID),
family = "poisson")
summary(fit_nb)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: Negative Binomial(871275696) ( log )
Formula: taxon ~ condition + (1 | day) + (1 | cage) + (1 | ID)
Data: tmp
AIC BIC logLik deviance df.resid
13191597893 13191597909 -6595798940 13191597881 108
Scaled residuals:
Min 1Q Median 3Q Max
-16507 -7242 -3110 5548 45807
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 19.0464 4.3642
day (Intercept) 51.7582 7.1943
cage (Intercept) 0.4132 0.6428
Number of obs: 114, groups: ID, 22; day, 17; cage, 6
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.562e+01 1.309e-02 1193 <2e-16 ***
conditionhealthy 2.059e+00 4.295e-05 47935 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
condtnhlthy 0.000
convergence code: 0
unable to evaluate scaled gradient
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.121249 (tol = 0.002, component 1)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
3: In theta.ml(Y, mu, weights = object@resp$weights, limit = limit, :
iteration limit reached
4: In sqrt(1/i) : NaNs produced
5: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.279021 (tol = 0.002, component 1)
6: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
7: In optTheta(g1, interval = interval, tol = tol, verbose = verbose, :
Model failed to converge with max|grad| = 0.0490774 (tol = 0.002, component 1)
8: In optTheta(g1, interval = interval, tol = tol, verbose = verbose, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
I think this is due to the huge order of magnitude of my response variable. But if I center and scale the abundance as recommended, I get negative values which are not allowed in GLMMs with poisson or negative binomial.
df$taxon_scaled <- scale(df$taxon)
df
ID cage day condition taxon taxon_scaled
1 100 4 10 healthy 291137072 0.039439167
2 100 4 23 healthy 49370511 -0.549483158
3 100 4 27 disease 0 -0.669745432
4 100 4 4 healthy 0 -0.669745432
5 101 5 10 healthy 255823819 -0.046580847
6 101 5 25 healthy 0 -0.669745432
7 101 5 26 disease 0 -0.669745432
8 101 5 4 healthy 491783963 0.528197553
9 102 6 10 healthy 827020812 1.344805374
10 102 6 29 healthy 1942357597 4.061668828
11 102 6 31 healthy 353989613 0.192542494
12 102 6 4 healthy 265950135 -0.021914021
13 103 1 10 healthy 683672045 0.995620238
14 103 1 20 healthy 248822130 -0.063636353
15 103 1 20 healthy 330402708 0.135086843
16 103 1 24 disease 72356773 -0.493490624
17 103 1 24 disease 0 -0.669745432
18 103 1 25 disease 0 -0.669745432
19 103 1 4 healthy 758199011 1.177161449
20 104 2 10 healthy 391127953 0.283008261
21 104 2 14 healthy 121840666 -0.372952160
22 104 2 24 healthy 428233074 0.373393111
23 104 2 26 healthy 0 -0.669745432
24 104 2 26 disease 0 -0.669745432
25 104 2 4 healthy 856582541 1.416815176
26 56 1 10 healthy 328956034 0.131562872
27 56 1 20 healthy 243055521 -0.077683310
28 56 1 25 healthy 274206537 -0.001802143
29 56 1 31 healthy 327404926 0.127784507
30 56 1 4 healthy 0 -0.669745432
31 57 2 10 healthy 250707674 -0.059043331
32 57 2 14 healthy 105076869 -0.413787312
33 57 2 24 healthy 0 -0.669745432
34 57 2 26 healthy 25117434 -0.608561546
35 57 2 29 healthy 307745763 0.079896496
36 57 2 30 healthy 35323060 -0.583701528
37 57 2 30 disease 26061862 -0.606261001
38 57 2 31 healthy 236960027 -0.092531405
39 57 2 4 healthy 0 -0.669745432
40 57 2 7 healthy 548242526 0.665725703
41 58 3 23 healthy 132782429 -0.346298976
42 58 3 27 healthy 53354564 -0.539778352
43 58 3 28 healthy 109248499 -0.403625584
44 58 3 28 disease 172993809 -0.248347550
45 58 3 28 disease 71076639 -0.496608917
46 58 3 29 healthy 136523472 -0.337186121
47 58 3 31 healthy 107758937 -0.407254027
48 58 3 4 healthy 418700327 0.350172169
49 59 4 10 healthy 240420771 -0.084101333
50 59 4 23 healthy 177600588 -0.237125838
51 59 4 27 healthy 0 -0.669745432
52 59 4 31 healthy 71662711 -0.495181298
53 59 4 4 healthy 0 -0.669745432
54 81 1 10 healthy 235286007 -0.096609171
55 81 1 19 healthy 0 -0.669745432
56 81 1 20 disease 0 -0.669745432
57 81 1 4 healthy 0 -0.669745432
58 82 2 14 healthy 162675954 -0.273480948
59 82 2 14 disease 0 -0.669745432
60 82 2 4 healthy 434068852 0.387608558
61 83 3 10 healthy 0 -0.669745432
62 83 3 28 healthy 115583049 -0.388195171
63 83 3 31 healthy 0 -0.669745432
64 83 3 4 healthy 0 -0.669745432
65 84 4 10 healthy 0 -0.669745432
66 84 4 17 healthy 0 -0.669745432
67 84 4 21 healthy 120404542 -0.376450435
68 84 4 4 healthy 0 -0.669745432
69 85 5 10 healthy 121380422 -0.374073274
70 85 5 26 healthy 398575728 0.301150394
71 85 5 31 healthy 49424593 -0.549351421
72 85 5 4 healthy 0 -0.669745432
73 86 6 10 healthy 589647622 0.766584917
74 86 6 29 disease 88311287 -0.454626812
75 86 6 29 disease 131933744 -0.348366299
76 86 6 4 healthy 0 -0.669745432
77 87 1 10 healthy 46217816 -0.557162849
78 87 1 25 healthy 0 -0.669745432
79 87 1 31 healthy 319339186 0.108137066
80 87 1 4 healthy 0 -0.669745432
81 90 3 10 healthy 277742799 0.006811884
82 90 3 28 disease 164413227 -0.269249102
83 90 3 31 healthy 283547803 0.020952368
84 90 3 4 healthy 1331380313 2.573381277
85 91 4 10 healthy 0 -0.669745432
86 91 4 23 healthy 0 -0.669745432
87 91 4 27 healthy 476684613 0.491416848
88 91 4 31 healthy 384647198 0.267221706
89 91 4 4 healthy 533681749 0.630256918
90 92 5 10 healthy 1495897788 2.974131544
91 92 5 31 healthy 806317700 1.294374394
92 92 5 4 healthy 1700020953 3.471357829
93 93 6 10 healthy 632004192 0.869761840
94 93 6 29 healthy 708512232 1.056128776
95 93 6 31 healthy 2031512652 4.278842791
96 93 6 4 healthy 596088438 0.782274187
97 98 2 10 healthy 328751027 0.131063493
98 98 2 23 healthy 0 -0.669745432
99 98 2 24 healthy 0 -0.669745432
100 98 2 25 disease 0 -0.669745432
101 98 2 26 disease 0 -0.669745432
102 98 2 26 healthy 0 -0.669745432
103 98 2 27 disease 0 -0.669745432
104 98 2 27 disease 0 -0.669745432
105 98 2 27 disease 0 -0.669745432
106 98 2 29 disease 78682114 -0.478082642
107 98 2 29 disease 0 -0.669745432
108 98 2 30 disease 0 -0.669745432
109 98 2 30 disease 69581995 -0.500249740
110 98 2 4 healthy 1619240477 3.274583613
111 99 3 10 healthy 188962800 -0.209448478
112 99 3 28 healthy 481068698 0.502096099
113 99 3 31 healthy 572791136 0.725523984
114 99 3 4 healthy 999854899 1.765814186
Question: What is the best way to improve a model with such high orders of magnitude in the count data?
I would appreciate any help, as I think I can't rely on the models so far.
Thanks in advance!