3

Suppose that I have the following dataset in R:

library(lme4)
data(Machines,package='nlme')
mydata <- Machines[Machines$Machine!='C',]

There are 3 levels with the factor Machine, so the last line above is meant to limit to only two levels (A and B) of the factor Machine. With the following model:

print(lmer(score ~ 1 + (1|Worker/Machine), data=mydata), ranef.comp="Var")

I have the variance components in the output as shown below:

Random effects:
Groups         Name        Variance
Machine:Worker (Intercept) 46.00   
Worker         (Intercept) 13.84   
Residual                    1.16 

I have trouble understanding exactly what the first two components are: Machine:Worker and Worker. Specifically,

1) What is the variance for Worker corresponding to the base (or reference) level of the factor Machine? If so, what is the base level: the first level A in the dataset or alphabetically the first level A (it happens to be the same in this particular dataset)?

2) What is the variance for Machine:Worker? Is it the variance for the second level of the factor Machine, or the extra variance relative to the variance for Worker?

Furthermore, for the model with the original dataset in which the factor Machine has 3 levels,

print(lmer(score ~ 1 + (1|Worker/Machine), data=Machines), ranef.comp="Var")

what is the variance for Machine:Worker in the following result since there are 3 levels involved in the factor Machine?

Random effects:
Groups         Name        Variance
Machine:Worker (Intercept) 60.2972 
Worker         (Intercept)  7.3959 
Residual                    0.9246
Stefan
  • 4,977
  • 1
  • 18
  • 38
bluepole
  • 2,376
  • 3
  • 25
  • 34
  • Isn't using a nesting structure where machine is nested within worker contrary to the information extracted from `?Machines`? That help indicates that worker is crossed with machine, not nested within. Further, is it reasonable to consider the machine brand as a random effect? – Ashe Jan 09 '17 at 18:23
  • @Ashe If anything, I think `Machines` should be fixed in order to figure out which of the three machine brands work best for maximizing the productivity score given the population of workers (represented by six randomly chosen workers) for that specific company. Besides `+(1|Worker/Machine)`, one could also try `+(1|Worker)` alone, or even adding a random slope for machines with `+(Machine|Worker)`. – Stefan Jan 10 '17 at 02:33
  • @Stefan I agree that `Machines` should be fixed. The OP should also consider the purpose of nesting. Nesting isn't a consequence of the model of interest, it's a consequence of the experimental design. A model using machine nested within worker (`(1|Worker/Machine)`) is contradictory to the experimental setup described in `?Machines`, and consequently an inappropriate model. It states 6 workers are assigned to **each** of 3 machines repeated 3 times. That is workers crossed with machines, not machines nested within workers. Using a nested model would give a false understanding of the variance. – Ashe Jan 10 '17 at 14:19
  • @Ashe I agree, but I I think it would also be valid to account for the potential variation (interdependence) of worker and machine, i.e. `+(1|Machine:Worker)`. Since we agreed that `Worker` should be random and `+ (1|Worker/Machine)` expands to `+(1|Worker) + (1|Machine:Worker)`, this could be a model of interest, no? See also [here](https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q2/000999.html). – Stefan Jan 10 '17 at 15:10
  • @Stefan I agree and can see how this would be valid. In the OP's original model however, `(1|Worker/Machine)` conflates the interaction variance with the `Machine` variance. Given the crossed design of the data itself, this conflation seems unnecessary. Adding a fixed `Machine` term would address all the concerns though, as you mentioned originally. – Ashe Jan 10 '17 at 15:31

1 Answers1

3

This is a random intercept model with Machines nested in Workers. The data description can be found by ?Machines:

Data on an experiment to compare three brands of machines used in an industrial process are presented in Milliken and Johnson (p. 285, 1992). Six workers were chosen randomly among the employees of a factory to operate each machine three times. The response is an overall productivity score taking into account the number and quality of components produced.

I am assuming the goal is to figure out the level of variation on the productivity score (some quality control measure) that can be attributed to the Worker or the combinations of Workers and Machines, i.e. Machine:Worker interdependence. The : indicates a grouping of two variables and, I think, cannot be interpreted the same way as an interaction in the fixed effects statements of an ANOVA or mixed effects model. So in the random statement you are not comparing to "base" or "reference" levels as you would do with the estimates in the fixed effects output. Instead, you are asking for the variance of the outcome with respect to the grouping variables. See also here.

Note that + (1|Worker/Machine) expands to +(1|Worker) + (1|Machine:Worker) and is identical (see also here under REML for GLMMs?).

The variance of Worker on the productivity score is much lower than the variance of Machine:Worker. So there is a considerable amount of interdependence between Machine and Worker. A next step should be to figure out which machine is ideal given the companies productivity score and their population of workers by adding Machine as a fixed effect to the model.

Addressing comment below:

To extract the individual variances for each random intercept:

m1 <- lmer(score ~ 1 + (1|Worker/Machine), data=Machines)
r1 <- ranef(m1, condVar=T) ## RANDOM INTERCEPTS AND THE CONDITIONAL VARIANCES

r1
$`Machine:Worker`
    (Intercept)
A:6 -10.3657583
A:2  -6.6044150
A:4  -8.3546072
A:1  -7.3172286
A:3  -1.8417378
A:5  -9.0603434
B:6 -13.5163208
B:2   0.3599863
B:4   3.0869093
B:1   2.8972267
B:3   6.6150352
B:5   4.5699849
C:6   4.0605017
C:2   2.6151258
C:4   5.1099021
C:1   7.1753590
C:3   9.3676320
C:5  11.2027481

$Worker
  (Intercept)
6 -2.43125701
2 -0.44515973
4 -0.01935477
1  0.33796408
3  1.73448525
5  0.82332218

with conditional variances for “Machine:Worker” “Worker” 

To extract those variances:

attr(r1[[1]], which ="postVar") ## Machine:Worker
attr(r1[[2]], which ="postVar") ## Worker

Or you can plot them with the sjp.lmer() function in the sjPlot package:

sjp.lmer(m1) ## PRODUCES TWO PLOTS

enter image description here enter image description here

Stefan
  • 4,977
  • 1
  • 18
  • 38
  • Thanks for the answer. Suppose that I want the variance for each of the two Machines `A` and `B` with `mydata` (or `A`, `B`, and `C` in the full dataset `Machines`) separately. How can I obtain that with the current model? Or do I have to specify a different random-effects structure? – bluepole Jan 06 '17 at 22:51
  • @bluepole is that what you are after? – Stefan Jan 07 '17 at 00:04
  • Yes, the variance for each machine, not the random effect for each worker . I know that I can get that by limiting the data on each machine, but I'm wondering whether there is a way to obtain it in a model with the full dataset. – bluepole Jan 07 '17 at 00:53
  • @bluepole I am not sure what you mean... "limiting the data on each machine"? "in a model with the full dataset"? – Stefan Jan 07 '17 at 00:56
  • @bluepole reading your first comment again, yes if you want random intercepts (and variances) for `Machine`, then your random statement should contain `+(1|Machine)`. Is this what your are asking? – Stefan Jan 07 '17 at 01:02
  • @bluepole I just remembered I asked a similar question once regarding the "interaction" in the random part of the model. I linked this now to my answer. – Stefan Jan 10 '17 at 01:54