I’m wondering how to calculate the standard errors and p-values of interaction effects in logistic models.
For illustration, consider this toy example, where we think some treatment effect will differ between men and women (but not by education):
set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)
m <- glm(dv ~ condition +
gender +
education +
condition:gender,
family = "binomial")
Which gives us this:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.07849 0.11065 -0.709 0.4781
conditiontreat 0.22031 0.12658 1.740 0.0818 .
gendermale -0.04788 0.12558 -0.381 0.7030
educationpost_grad 0.02815 0.10965 0.257 0.7974
educationundergrad 0.04922 0.10981 0.448 0.6540
conditiontreat:gendermale -0.13205 0.17921 -0.737 0.4612
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here, the treatment effect for female
(the reference category) is estimated at 0.22031
, with an SE of 0.12658
and a p-value of 0.0818
.
The treatment effect for male
, by contrast, would be
beta <- 0.22031 - 0.13205 # 0.08826
But what is the SE and p-value for that estimate? I believe it's not the ones given on the final row of the output, since they speak to the difference in effect between male
and female
.
1. Calculating the SE
From this and this, I gather that the SE should instead be calculated as follows:
$\sqrt{var(\hat{\beta_1}) + var(\hat{\beta_5}) + 2*cov(\hat{\beta_1}\hat{\beta_5})}$
Applied to the case above, that would give us the following
beta_se <- sqrt(vcov(m)[2,2] + vcov(m)[6,6] + 2*vcov(m)[2,6]) # 0.1268722
2. Calculating the p-value
Once I have the SE, I gather from this that I would then simply calculate the p-value like this:
pnorm(-abs(beta)/beta_se) * 2 # 0.4866415
Have I understood all of this correctly?
EDIT: Full code in one block:
set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)
m <- glm(dv ~ condition +
gender +
education +
condition:gender,
family = "binomial")
summary(m)
beta <- 0.22031 - 0.13205 # 0.08826
beta_se <- sqrt(vcov(m)[2,2] + vcov(m)[6,6] + 2*vcov(m)[2,6]) # 0.1268722
pnorm(-abs(beta)/beta_se) * 2 # 0.4866415
library(multcomp)
summary(glht(m, linfct = c("conditiontreat + conditiontreat:gendermale = 0")))
Which gives:
Simultaneous Tests for General Linear Hypotheses
Fit: glm(formula = dv ~ condition + gender + education + condition:gender,
family = "binomial")
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
conditiontreat + conditiontreat:gendermale == 0 0.08825 0.12687 0.696 0.487
(Adjusted p values reported -- single-step method)