1

I am analysing a dataset that has 20 sampling sites, each sampled five times (T1-T5) to measure the continuous response variable ('resp'). There is a continuous predictor ('covar') that was measured in each session, but the value was 0 for every site in sessions T1 and T2.

# Make the dataset
set.seed(345)
mydf = data.frame(resp = runif(100, 0, 50),
                  covar = runif(100, 30, 90),
                  time = as.factor(rep(c("T1", "T2", "T3", "T4", "T5"), each = 20)),
                  site = as.factor(rep(1:20, times = 5)))
mydf[1:40,2] = 0

> str(mydf)
'data.frame':   100 obs. of  4 variables:
 $ resp : num  10.8 13.7 19.5 32.8 21.8 ...
 $ covar: num  0 0 0 0 0 0 0 0 0 0 ...
 $ time : Factor w/ 5 levels "T1","T2","T3",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ site : Factor w/ 20 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...

I am interested in how the effect of covar on resp varies between sessions, so I fitted the following model in lme4:

> m1 = lmer(resp ~ covar * time + (1|site), data = mydf)
fixed-effect model matrix is rank deficient so dropping 2 columns / coefficients
boundary (singular) fit: see ?isSingular

Two terms are excluded due to rank deficiency (covar:timeT2 and covar:timeT5). I believe this is due to the values of covar all being 0 for T1 and T2, but I don't understand why covar:timeT5 in particular is excluded (rather than covar:T4 for example).

I haven't been able to find a post that matches this situation, so will be grateful if anyone can help explain what is happening here. Many thanks in advance.

> summary(m1)
Linear mixed model fit by REML ['lmerMod']
Formula: resp ~ covar * time + (1 | site)
   Data: mydf

REML criterion at convergence: 796.5

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.59299 -0.88562 -0.06054  0.79504  1.92953 

Random effects:
 Groups   Name        Variance Std.Dev.
 site     (Intercept)   0.0     0.00   
 Residual             216.4    14.71   
Number of obs: 100, groups:  site, 20

Fixed effects:
             Estimate Std. Error t value
(Intercept)   26.6204     3.2891   8.094
covar         -0.4122     0.1860  -2.216
timeT2         0.8271     4.6515   0.178
timeT3        -9.6063    14.0368  -0.684
timeT4        -8.1798    12.8894  -0.635
timeT5        24.4145    12.1093   2.016
covar:timeT3   0.4942     0.2936   1.683
covar:timeT4   0.4736     0.2656   1.783

(P.S. there is also a boundary (singular) fit warning, which occurs because the random effect accounts for 0 variance)

1 Answers1

0

You can inspect the model matrix using the model.matrix() command, which is often very useful to understand singularities. model.matrix(m1) will give you the model matrix for the fitted model m1, with the singular columns already removed, so that is not what we want. We want

model.matrix(resp ~ covar * time,data=mydf)

Here are the first few lines of the output:

    (Intercept)    covar timeT2 timeT3 timeT4 timeT5 covar:timeT2 covar:timeT3 covar:timeT4 covar:timeT5
1             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
2             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
3             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
4             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
5             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000
6             1  0.00000      0      0      0      0            0      0.00000      0.00000      0.00000

Actually, to see the interesting parts, you have to look at the entire matrix. Your columns are linearly dependent, specifically:

covar = covar:timeT3 + covar:timeT4 + covar:timeT5

In cases like these, R will drop columns from the right (and columns are ordered first by interactions, then alphabetically) until the remaining matrix is not singular any more.

You may find it more informative to drop the covar main effect and only keep the interaction terms, by subtracting it from the formula (R will still drop the covar:timeT2 column, which is a constant zero):

summary(lmer(resp ~ covar * time - covar + (1|site), data = mydf))

A completely equivalent specification is to specify the interaction only by using : (* specifies interactions and all lower-order interactions and main effects):

summary(lmer(resp ~ time + covar : time + (1|site), data = mydf))
Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357