I am trying to calculate $R^2$ (variance explained) for a set of data using GLMM's, and . Here's some dummy data.
set.seed(7127)
# Number sampled
n = 10000
# Average height
m = 180
leg = rnorm(n, 75, 5)
torso = rnorm(n, 75, 5)
head = rnorm(n, 30, 1)
df = data.frame(1:n, sample(leg, n, replace = T), sample(torso, n, replace = T), sample(head, n, replace = T))
df$height = rowSums(df[,2:4])
colnames(df) = c("person", "leg", "torso", "head", "height")
This is height data for 1000 people, measured as three components of height (leg, torso, and head) which sum to give total height. I want to have an $R^2$ value for each of the three components, where $R^2$ is how much of the variance in height is explained by leg length, torso length, and head length individually.
To do this I have made 3 models where each component is modelled as a fixed effect, with a random effect (randomly assigned the group 1 or 2, these could be analogous to sex [male or female]), as well as a random effects only model (order in code is null, leg, torso, head).
mod0 = lmer(df$height ~ 1 + (1|df$random))
mod0
modL = lmer(df$height ~ df$leg + (1|df$random))
modL
modT = lmer(df$height ~ df$torso + (1|df$random))
modT
modH = lmer(df$height ~ df$head + (1|df$random))
modH
Following the protocol of Nakagawa and Schielzeth (2013) I have tried to calculate Marginal and Conditional $R^2$ values for each component of height (order in code is leg, torso, head).
# Marginal R squares
VarFixedL = var(fixef(modL)[2]*getME(modL,"X")[,2])
R2M_L = VarFixedL / (VarFixedL + VarCorr(modL)$df[1] + attr(VarCorr(modL),"sc")^2)
R2M_L
VarFixedT = var(fixef(modT)[2]*getME(modT,"X")[,2])
R2M_T = VarFixedT / (VarFixedT + VarCorr(modT)$df[1] + attr(VarCorr(modT),"sc")^2)
R2M_T
VarFixedH = var(fixef(modH)[2]*getME(modH,"X")[,2])
R2M_H = VarFixedH / (VarFixedH + VarCorr(modH)$df[1] + attr(VarCorr(modH),"sc")^2)
R2M_H
# Conditional R squares
R2C_L = (VarFixedL + VarCorr(modL)$df[1]) / (VarFixedL + VarCorr(modL)$df[1] + attr(VarCorr(modL),"sc")^2)
R2C_L
R2C_T = (VarFixedT + VarCorr(modT)$df[1]) / (VarFixedT + VarCorr(modT)$df[1] + attr(VarCorr(modT),"sc")^2)
R2C_T
R2C_H = (VarFixedH + VarCorr(modH)$df[1]) / (VarFixedH + VarCorr(modH)$df[1] + attr(VarCorr(modH),"sc")^2)
R2C_H
The results suggest that the variance in height is largely explained by leg and torso length (~49% each). (Edit: I have tested this on the sample data provided by the Nakagawa & Schielzeth paper and reproduced their results, but would appreciate feedback on whether this is correct).
> R2M_L
[1] 0.4869814
> R2M_T
[1] 0.4909379
> R2M_H
[1] 0.02097508
> R2C_L
[1] 0.4871594
> R2C_T
[1] 0.4909379
> R2C_H
[1] 0.02097508
Here the two types of $R^2$ are (virtually) identical. If I introduce variance as a result of the random group (e.g. df$height = ifelse(df$random == "2", df$height, df$height+rnorm(n, 10, 1))
) I get differences between the two types, could someone explain the difference between the Marginal and Conditional $R^2$. Why would one be better or, when would one be more appropriate, than the other? Why do they react differently to the addition of random or unmeasured variance components?