I am modelling visitor counts to a sample of sites in a forest in order to predict the number of visitors to the rest of the forest.
My predictor variables are time of day (categorical), day of week (categorical), distance of site from nearest access point (m) and number of households around access points (in three different buffer distances i.e. 'bands').
I am fitting a GLMM with Poisson error in lme4. I am using site as a random effect because sites were visited more than once (~ 4 times).
When I use the Predict
function in lme4 I have to set re.form = NA
because I am predicting for out of sample sites, thus the new data points do not have a site ID. I read that this is setting random effects to zero and predicting only at the population level.
My question is this:
I have found that a GLM (i.e. site is not included in the model) results in higher predictions (2 fold) (mean predictions from GLM = 0.2 visits cf GLMM = 0.1)
Why would this be? The coefficients for the fixed effects in the GLMM are not hugely different from the coefficients in the GLM.
The model outputs are as follows (sorry I can't provide data to reproduce these models):
GLM
Call:
glm(formula = dog.walkers.count ~ day5code + dog.time + wght_dist +
band1to4 + band5to9 + band10to11, family = "poisson", data = dog.data2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9657 -0.6832 -0.4969 -0.3070 6.4859
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.914835 0.084291 -22.717 < 2e-16 ***
day5codeMon -0.232330 0.110504 -2.102 0.0355 *
day5codeSa-BH 0.646091 0.083244 7.761 8.40e-15 ***
day5codeTue -0.081767 0.097487 -0.839 0.4016
day5codeWe-Th -0.152445 0.088925 -1.714 0.0865 .
dog.time12/17 -0.535558 0.100841 -5.311 1.09e-07 ***
dog.time6 -0.192202 0.112955 -1.702 0.0888 .
dog.time7/10/15 0.122536 0.065583 1.868 0.0617 .
dog.time8 0.124831 0.094472 1.321 0.1864
dog.time9 0.399156 0.076175 5.240 1.61e-07 ***
wght_dist -0.720439 0.036177 -19.914 < 2e-16 ***
band1to4 0.278427 0.016545 16.829 < 2e-16 ***
band5to9 0.162052 0.026452 6.126 8.99e-10 ***
band10to11 -0.006501 0.028706 -0.226 0.8208
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 6531.8 on 6837 degrees of freedom
Residual deviance: 5387.2 on 6824 degrees of freedom
AIC: 7713.9
Number of Fisher Scoring iterations: 6
GLMM
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: poisson ( log )
Formula: dog.walkers.count ~ day5code + dog.time + wght_dist + band1to4 + band5to9 + band10to11 + (1 | SiteID)
Data: dog.data2
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
6707.6 6810.1 -3338.8 6677.6 6823
Scaled residuals:
Min 1Q Median 3Q Max
-1.7131 -0.3860 -0.1928 -0.1154 7.2060
Random effects:
Groups Name Variance Std.Dev.
SiteID (Intercept) 1.991 1.411
Number of obs: 6838, groups: SiteID, 339
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.76661 0.14187 -19.501 < 2e-16 ***
day5codeMon -0.03030 0.12248 -0.247 0.80464
day5codeSa-BH 0.54844 0.09099 6.027 1.67e-09 ***
day5codeTue -0.06656 0.10631 -0.626 0.53126
day5codeWe-Th -0.22075 0.09758 -2.262 0.02368 *
dog.time12/17 -0.48614 0.10772 -4.513 6.39e-06 ***
dog.time6 -0.11158 0.12296 -0.908 0.36414
dog.time7/10/15 0.18912 0.07199 2.627 0.00861 **
dog.time8 0.29850 0.10667 2.798 0.00514 **
dog.time9 0.47665 0.08761 5.441 5.31e-08 ***
wght_dist -0.67480 0.10080 -6.695 2.16e-11 ***
band1to4 0.39614 0.08732 4.537 5.72e-06 ***
band5to9 0.16443 0.09713 1.693 0.09046 .
band10to11 0.03160 0.10094 0.313 0.75423