0

I am trying to calculate $R^2$ (variance explained) for a set of data using GLMM's, and . Here's some dummy data.

set.seed(7127)
# Number sampled
n = 10000
# Average height
m = 180 

leg     = rnorm(n, 75, 5)
torso   = rnorm(n, 75, 5)
head    = rnorm(n, 30, 1)

df = data.frame(1:n, sample(leg, n, replace = T), sample(torso, n, replace = T), sample(head, n, replace = T))
df$height = rowSums(df[,2:4])
colnames(df) = c("person", "leg", "torso", "head", "height")

This is height data for 1000 people, measured as three components of height (leg, torso, and head) which sum to give total height. I want to have an $R^2$ value for each of the three components, where $R^2$ is how much of the variance in height is explained by leg length, torso length, and head length individually.

To do this I have made 3 models where each component is modelled as a fixed effect, with a random effect (randomly assigned the group 1 or 2, these could be analogous to sex [male or female]), as well as a random effects only model (order in code is null, leg, torso, head).

mod0 = lmer(df$height ~ 1  + (1|df$random))
mod0

modL = lmer(df$height ~ df$leg   + (1|df$random))
modL 

modT = lmer(df$height ~ df$torso     + (1|df$random))
modT

modH = lmer(df$height ~ df$head  + (1|df$random))
modH

Following the protocol of Nakagawa and Schielzeth (2013) I have tried to calculate Marginal and Conditional $R^2$ values for each component of height (order in code is leg, torso, head).

# Marginal R squares
VarFixedL = var(fixef(modL)[2]*getME(modL,"X")[,2])
R2M_L = VarFixedL / (VarFixedL + VarCorr(modL)$df[1] + attr(VarCorr(modL),"sc")^2)
R2M_L

VarFixedT = var(fixef(modT)[2]*getME(modT,"X")[,2])
R2M_T = VarFixedT / (VarFixedT + VarCorr(modT)$df[1] + attr(VarCorr(modT),"sc")^2)
R2M_T

VarFixedH = var(fixef(modH)[2]*getME(modH,"X")[,2])
R2M_H = VarFixedH / (VarFixedH + VarCorr(modH)$df[1] + attr(VarCorr(modH),"sc")^2)
R2M_H

# Conditional R squares
R2C_L = (VarFixedL + VarCorr(modL)$df[1]) / (VarFixedL + VarCorr(modL)$df[1] + attr(VarCorr(modL),"sc")^2)
R2C_L

R2C_T = (VarFixedT + VarCorr(modT)$df[1]) / (VarFixedT + VarCorr(modT)$df[1] + attr(VarCorr(modT),"sc")^2)
R2C_T

R2C_H = (VarFixedH + VarCorr(modH)$df[1]) / (VarFixedH + VarCorr(modH)$df[1] + attr(VarCorr(modH),"sc")^2)
R2C_H

The results suggest that the variance in height is largely explained by leg and torso length (~49% each). (Edit: I have tested this on the sample data provided by the Nakagawa & Schielzeth paper and reproduced their results, but would appreciate feedback on whether this is correct).

> R2M_L
[1] 0.4869814
> R2M_T
[1] 0.4909379
> R2M_H
[1] 0.02097508

> R2C_L
[1] 0.4871594
> R2C_T
[1] 0.4909379
> R2C_H
[1] 0.02097508

Here the two types of $R^2$ are (virtually) identical. If I introduce variance as a result of the random group (e.g. df$height = ifelse(df$random == "2", df$height, df$height+rnorm(n, 10, 1))) I get differences between the two types, could someone explain the difference between the Marginal and Conditional $R^2$. Why would one be better or, when would one be more appropriate, than the other? Why do they react differently to the addition of random or unmeasured variance components?

rg255
  • 752
  • 2
  • 8
  • 27
  • See: http://stats.stackexchange.com/questions/43709/r2-for-mixed-models-with-multiple-fixed-and-random-effects and http://stats.stackexchange.com/questions/111150/calculating-r2-in-mixed-models-using-nakagawa-schielzeths-2013-r2glmm-me the Nakagawa and Schielzeth (2013) method is already implemented in `MuMIn` library so you could use the already implemented version. – Tim Mar 24 '15 at 08:01
  • 1
    @Tim Thanks, I will edit this question to make it more focussed on the Conditional vs Marginal question – rg255 Mar 24 '15 at 08:09
  • See also: http://stats.stackexchange.com/questions/8630/principal-component-analysis-backwards-how-much-variance-of-the-data-is-expla and http://stats.stackexchange.com/questions/52828/variance-of-the-data-explained-by-a-single-variable since it is not really clear if you need a LMM for this problem. – Tim Mar 24 '15 at 08:15
  • In the data I want to apply the principles to there are two random effects (sex and block) and nine components (those like leg, torso...) - this is just a simplification set of reproducible dummy data – rg255 Mar 24 '15 at 08:33
  • But if *total* height is just a sum of some components included in the model then the remaining (random) factors will be equal to 0 and the residual variance will be equal to 0 so the model will not make much sens. (However I understand that this is a simplified example and my comment may not be valid to the actual data you have.) – Tim Mar 24 '15 at 08:43

0 Answers0