5

I have a question about nested mixed effect model. For example I have species A with different populations; these populations belong to two kinds of habitat types (with or without predators). So I have population nested within habitat type, and I measured the body size of species A. I want to know if habitat type have a significant effect on body size or not. Thus, I got one model like this:

m1<-lmer(body_size~habitat_type+(1|year)+(1|habitat_type/population))
m2<-lmer(body_size~habitat_type+(1|year)+(1|habitat_type)+(1|habitat_type:population))

From http://conjugateprior.org/2013/01/formulae-in-r-anova/, I think m1 equals m2. But I was wodering why we regard habitat_type as both fixed effect and random effect? Does this make sense? Or I should just use the following model (remove habitat_type as random effect):

m3<-lmer(body_size~habitat_type+(1|year)+(1|habitat_type:population))  

outputs for models:

   > summary(m2)
Linear mixed model fit by REML ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | habitat_type/population)
   Data: exuv

REML criterion at convergence: 4219.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5214 -0.6482  0.0054  0.6369  3.2242 

Random effects:
 Groups                  Name        Variance Std.Dev.
 population:habitat_type (Intercept) 0.20850  0.4566  
 year                    (Intercept) 0.06942  0.2635  
 habitat_type            (Intercept) 0.08963  0.2994  
 Residual                            0.73031  0.8546  
Number of obs: 1625, groups:  
population:habitat_type, 50; year, 19; habitat_type, 2

Fixed effects:
                Estimate Std. Error t value
(Intercept)       0.3916     0.3275   1.196
habitat_typeinv  -0.6325     0.4509  -1.403

Correlation of Fixed Effects:
            (Intr)
habtt_typnv -0.699  

    > summary(m3)
Linear mixed model fit by REML ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | habitat_type:population)
   Data: exuv

REML criterion at convergence: 4219.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5214 -0.6482  0.0054  0.6369  3.2242 

Random effects:
 Groups                  Name        Variance Std.Dev.
 habitat_type:population (Intercept) 0.20850  0.4566  
 year                    (Intercept) 0.06942  0.2635  
 Residual                            0.73031  0.8546  
Number of obs: 1625, groups:  habitat_type:population, 50; year, 19

Fixed effects:
                Estimate Std. Error t value
(Intercept)       0.3916     0.1329   2.947
habitat_typeinv  -0.6325     0.1549  -4.082

Correlation of Fixed Effects:
            (Intr)
habtt_typnv -0.661


    > summary(m4)
Linear mixed model fit by REML ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | population)
   Data: exuv

REML criterion at convergence: 4219.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5214 -0.6482  0.0054  0.6369  3.2242 

Random effects:
 Groups     Name        Variance Std.Dev.
 population (Intercept) 0.20850  0.4566  
 year       (Intercept) 0.06942  0.2635  
 Residual               0.73031  0.8546  
Number of obs: 1625, groups:  population, 50; year, 19

Fixed effects:
                Estimate Std. Error t value
(Intercept)       0.3916     0.1329   2.947
habitat_typeinv  -0.6325     0.1549  -4.082

Correlation of Fixed Effects:
            (Intr)
habtt_typnv -0.661

    > summary(m2ml)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: body_size~ habitat_type + (1 | year) + (1 | habitat_type/population)
   Data: exuv

     AIC      BIC   logLik deviance df.resid 
  4226.6   4259.0  -2107.3   4214.6     1619 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5215 -0.6493  0.0047  0.6360  3.2245 

Random effects:
 Groups                  Name        Variance  Std.Dev. 
 population:habitat_type (Intercept) 1.993e-01 4.464e-01
 year                    (Intercept) 6.363e-02 2.523e-01
 habitat_type            (Intercept) 4.365e-18 2.089e-09
 Residual                            7.305e-01 8.547e-01
Number of obs: 1625, groups:  
population:habitat_type, 50; year, 19; habitat_type, 2

Fixed effects:
                Estimate Std. Error t value
(Intercept)       0.3889     0.1297   3.000
habitat_typeinv  -0.6299     0.1519  -4.147

Correlation of Fixed Effects:
            (Intr)
habtt_typnv -0.664
Bin
  • 51
  • 5
  • 1
    I found some answers with similar questions. It is very helpful. See http://stats.stackexchange.com/questions/70556/have-i-correctly-specified-my-model-in-lmer. – Bin Apr 12 '17 at 12:53
  • 1
    So what is the output when you fit these models? If m2 estimates zero variance for `(1|habitat_type)`, then it's equal to m3. That's what I would expect (because you are right: if there is fixed term for `habitat_type` then there is nothing left for the random term to capture). – amoeba Apr 12 '17 at 15:09
  • Also, if your `population` is coded such that different habitats have different population ids, then you can simply write `(1|population)` instead of `(1|habitat_type:population)`. – amoeba Apr 12 '17 at 15:12
  • 1
    You @amoeba got a good point. I tried with m2, m3 and another new model m4 (with random factor `(1|population)`, `m4 – Bin Apr 18 '17 at 17:00
  • You should definitely not use habitat_type as a random effect if you include it as a fixed one. And `(1|population)` and `(1|habitat_type:population)` are mathematically equivalent if your population id codes are distinct in different habitats (as I guess they are). Bottomline: use `m4`. – amoeba Apr 18 '17 at 18:08
  • Regarding `m2`, I don't know why `habitat_type` comes out with non-zero variance. To my understanding, it should be exactly zero. That's strange. But you should use `m4` in any case. – amoeba Apr 18 '17 at 19:22
  • 2
    How about using maximum likelihood, i.e., setting `REML = FALSE` for `lmer`? Can you provide the complete output summary of all the models you tried? – Randel Apr 18 '17 at 21:45
  • 2
    Thank you very much @JiebiaoWang. I followed your advice and use the setting `REML=F`. Then for m2 the variance=4.3e-18, Std.Dev.=2.1e-09. Almost zero @amoeba. This is amazing. But this further makes me confused about how to choose between REML and ML. I checked some information about REML and ML differences. In my mind, usually REML is preferred. – Bin Apr 19 '17 at 13:16
  • @JiebiaoWang Cool. I would also like to know why using ML yields zero (up to machine precision) variance of `habitat_type` whereas using REML yields some non-zero value. – amoeba Apr 19 '17 at 14:09
  • @amoeba Sorry for forgetting to mention `year` is also a random effect in my model. I also tried model without random effect `(1|year)`. In this situation, the `habitat_type` as random effect variance equals to 0, just like you mentioned. But when there is `(1|year)`, then `habitat_type` as random effect variance equals to 0.08963. Maybe the complexity of random effects is a reason. – Bin Apr 19 '17 at 14:11
  • 1
    As @amoeba pointed out, [here](https://stats.stackexchange.com/questions/71587/can-a-variable-be-included-in-a-mixed-model-as-a-fixed-effect-and-as-a-random-ef/71589#71589) is a simplified but more extreme example. In my understanding, REML and ML should not differ so much in this case. There may be some numerical issues for REML. Given that the estimates rely on iterations, I will think more about this when I have more time. Any comments are welcome. – Randel Apr 19 '17 at 19:36

0 Answers0