Trying to understand partial R-squared

Question

I'm trying to understand partial R-squared as computed by package rsq. See the reproducible example on the standard data set:

require(rsq)
data(esoph)
model1 <- glm(cbind(ncases, ncontrols) ~ agegp + tobgp * alcgp,
              data = esoph, family = binomial)

attach(esoph)
rsq(model1)
# [1] 0.826124
rsq.partial(model1)
$variable
#[1] "agegp"       "tobgp"       "alcgp"       "tobgp:alcgp"
#
#$partial.rsq
#[1]  0.6548246836256900182960 -0.0000000000000006661338  0.0000000000000000000000  0.1456091052940046148834
rsq.partial(model1, adj = TRUE)
#$adjustment
#[1] TRUE
#
#$variable
#[1] "agegp"       "tobgp"       "alcgp"       "tobgp:alcgp"
#
#$partial.rsq
#[1]  0.6290653316574579267950 -0.0000000000000006661338  0.0000000000000000000000  0.0308401791394680158120
detach(esoph)

Here, the partial R-squared for the first variable agegp seems too high. Just look at the deviance anova table:

anova(model1)
#             Df Deviance Resid. Df Resid. Dev
# NULL                           87    227.241
# agegp        5   88.128        82    139.112
# tobgp        3   19.085        79    120.028
# alcgp        3   66.054        76     53.973
# tobgp:alcgp  9    6.489        67     47.484

cat("pseudo R2 agegp: ", 1-139.112/model1$null.deviance, "\n")
pseudo R2 agegp:  0.3878206

So from the deviance table, the pseudo R-squared corresponding to the agegp variable is 0.3878. This also corresponds to rsq() and rsq.partial called on a model with only agegp variable:

model1 <- glm(cbind(ncases, ncontrols) ~ agegp, data = esoph, family = binomial)
attach(esoph)
rsq(model1)
# [1] 0.3791795
rsq.partial(model1)
#$adjustment
#[1] FALSE
#
#$variable
#[1] "agegp"
#
#$partial.rsq
#[1] 0.3791795
detach(esoph)

So why is the partial R-squared coefficient claimed to be 0.655 (or 0.629 adjusted)?

dipetkov · Accepted Answer · 2019-11-03T21:46:41.577

The R-square compares two models: a full model and a reduced model. You compute the pseudo R-square twice but you change both the full model and the reduced model. It is hard to compare two R-squares when there are four models involved.

There are several definitions of the pseudo R-square. Let's take the Efron's definition since it is easy to compute and on this problem it gives similar "unintuitive" results as the measure used by rsq by default. Efron's pseudo R-square is 1 - SSE.full / SSE.reduced. When you change both the full model and the reduced model, you change both the nominator and the denominator of the ratio. The computation is below but in short agegp explains 35% of the total variability but 60% of the variability left unexplained after taking into account tobgp and alcgp.

More about pseudo R-squares: https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/.

First let's explicitly define the full and reduced models that are compared in your question.

library("rsq")

data(esoph)
attach(esoph)

# Full model with all three predictors.
# (Technically, three main predictors + interaction.)
fit.x1x2x3 <- glm(
  cbind(ncases, ncontrols) ~ agegp + tobgp * alcgp,
  family = binomial
)
# Reduced model with only agegp.
fit.x1 <- glm(
  cbind(ncases, ncontrols) ~ agegp,
  family = binomial
)
# Reduced model with agegp removed.
fit.x2x3 <- glm(
  cbind(ncases, ncontrols) ~ tobgp * alcgp,
  family = binomial
)
# Null model with no predictors.
fit.null <- glm(
  cbind(ncases, ncontrols) ~ 1,
  family = binomial
)

# The `rsq` function has 5 methods for computing the R-squared.
# The default is "v" (variance-function-based R-squared)
# but I will use "sse" since it is easy to compute.
# It gives similar results in this case.
type <- "sse"

# Computation 1: 
# What is the "value" of adding agegp to a model which includes all other predictors?

rsq.partial(fit.x1x2x3, type = type)$partial.rsq
#> [1] 0.6193845 0.0000000 0.0000000 0.1541909
rsq.partial(fit.x1x2x3, fit.x2x3, type = type)$partial.rsq
#> [1] 0.6193845

# So the partial R-square measures the additional variation explained by agegp,
# after adjustment for the other predictors.

# Note: Now it is clear why the partial R-squared for tobgp and alcgp are 0.
# By default, `rsq` comes up with reduced models by dropping each predictor
# and leaving the others in. However, it is a bit silly to remove the main
# effect for tobgp or alcgp but leave their interaction in.

# Computation 2: 
# What is the "value" of agegp for predicting the response given no other information?

rsq.partial(fit.x1, type = type)$partial.rsq
#> [1] 0.3515373
rsq.partial(fit.x1, fit.null, type = type)$partial.rsq
#> [1] 0.3515373

^{Created on 2019-11-03 by the reprex package (v0.3.0)}

And let's compute Efron's R-square explicitly to show that agegp explains 35% of the total variability but 60% of the variability left unexplained after taking into account tobgp and alcgp.

library("modelr")
library("tidyverse")

list(
  x1x2x3 = fit.x1x2x3,
  x2x3 = fit.x2x3,
  x1 = fit.x1,
  null = fit.null
) %>%
  map_dfr(~ add_predictions(esoph, ., var = "yhat", type = "response"),
          .id = "model") %>%
  mutate(
    n = ncases + ncontrols,
    y = ncases / n
  ) %>%
  group_by(model) %>%
  summarise(
    SSE = sum(n * (y - yhat)^2)
  )
#> # A tibble: 4 x 2
#>   model    SSE
#>   <chr>  <dbl>
#> 1 null   27.5 
#> 2 x1     17.8 
#> 3 x1x2x3  5.10
#> 4 x2x3   13.4

# 1 - SSE.x1x2x3 / SSE.x2x3
1 - 5.10 / 13.4
#> [1] 0.619403
# 1 - SSE.x1 / SSE.null
1 - 17.8 / 27.5
#> [1] 0.3527273

^{Created on 2019-11-03 by the reprex package (v0.3.0)}

Thanks for the answer! This explains something. But my main question still stays: how "the value of adding agegp" can be 0.65? That seems too high, considering that the total variability explained by `agegp` is 0.379 (last two commands). The `agegp` raises the rsq from `rsq(fit.x2x3)` = 0.496 up to `rsq(fit.x1x2x3)` = 0.826, so can the value of adding it be 0.65? I must miss something key here... — Tomas, Nov 03 '19 at 19:20
It is true that the total variability explained by `agegp` is ~35%. The confusion is about the meaning of the ~65% -- it is the variability which `agegp` explains after adjusting for the other two variables. So there is no contradiction. — dipetkov, Nov 03 '19 at 21:38
Thanks, so for me the essential understanding of how the 0.65 is possible is this: `(rsq(fit.x1x2x3) - rsq(fit.x2x3))/(1 - rsq(fit.x2x3))` which corresponds to the definition of `1 - deviance(fit.x1x2x3)/deviance(fit.x2x3)` (just to illustrate principle, different measure of variance was used by `rsq`). This can be converted to `%incMSE` from randomForest and also to partial correlation coefficient. Thanks for help! — Tomas, Nov 03 '19 at 22:20
Yes, I think you are right that it should be similar to `%incMSE` since `%incMSE` is computed by permuting, i.e. "dropping" each variable, in turn. — dipetkov, Nov 03 '19 at 22:39

Trying to understand partial R-squared

1 Answers1