Standard errors and p-values for interaction effects with GLM

Question

I’m wondering how to calculate the standard errors and p-values of interaction effects in logistic models.

For illustration, consider this toy example, where we think some treatment effect will differ between men and women (but not by education):

set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)

m <- glm(dv ~ condition +
             gender +
             education +
             condition:gender,
           family = "binomial")

Which gives us this:

Coefficients:
                          Estimate Std. Error z value Pr(>|z|)  
(Intercept)               -0.07849    0.11065  -0.709   0.4781  
conditiontreat             0.22031    0.12658   1.740   0.0818 .
gendermale                -0.04788    0.12558  -0.381   0.7030  
educationpost_grad         0.02815    0.10965   0.257   0.7974  
educationundergrad         0.04922    0.10981   0.448   0.6540  
conditiontreat:gendermale -0.13205    0.17921  -0.737   0.4612  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here, the treatment effect for female (the reference category) is estimated at 0.22031, with an SE of 0.12658 and a p-value of 0.0818.

The treatment effect for male, by contrast, would be

beta <- 0.22031 - 0.13205 # 0.08826

But what is the SE and p-value for that estimate? I believe it's not the ones given on the final row of the output, since they speak to the difference in effect between male and female.

1. Calculating the SE
From this and this, I gather that the SE should instead be calculated as follows:

$\sqrt{var(\hat{\beta_1}) + var(\hat{\beta_5}) + 2*cov(\hat{\beta_1}\hat{\beta_5})}$

Applied to the case above, that would give us the following

beta_se <- sqrt(vcov(m)[2,2] + vcov(m)[6,6] + 2*vcov(m)[2,6]) # 0.1268722

2. Calculating the p-value
Once I have the SE, I gather from this that I would then simply calculate the p-value like this:

pnorm(-abs(beta)/beta_se) * 2 # 0.4866415

Have I understood all of this correctly?

EDIT: Full code in one block:

set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)

m <- glm(dv ~ condition +
           gender +
           education +
           condition:gender,
         family = "binomial")
summary(m)

beta <- 0.22031 - 0.13205 # 0.08826
beta_se <- sqrt(vcov(m)[2,2] + vcov(m)[6,6] + 2*vcov(m)[2,6]) # 0.1268722
pnorm(-abs(beta)/beta_se) * 2 # 0.4866415

library(multcomp)
summary(glht(m, linfct = c("conditiontreat + conditiontreat:gendermale = 0")))

Which gives:

     Simultaneous Tests for General Linear Hypotheses

Fit: glm(formula = dv ~ condition + gender + education + condition:gender, 
    family = "binomial")

Linear Hypotheses:
                                                Estimate Std. Error z value Pr(>|z|)
conditiontreat + conditiontreat:gendermale == 0  0.08825    0.12687   0.696    0.487
(Adjusted p values reported -- single-step method)

dimitriy · Accepted Answer · 2020-10-02T08:14:39.093

2

I cannot replicate your estimates with the R code provided, but you can verify your manual calculations by testing the null that linear combination of two coefficients is zero:

set.seed(101)
n <- 2000
dv <- sample(0:1,n,rep=TRUE)
condition <- sample(c("control","treat"),n,rep=TRUE)
gender <- sample(c("male","female"),n,rep=TRUE)
education <- sample(c("less_than_undergrad","undergrad","post_grad"),n,rep=TRUE)

m <- glm(dv ~ condition +
           gender +
           education +
           condition:gender,
           family = "binomial")

write.dta(data, "~/Desktop/data.dta")
summary(m)

names(coef(m)) 
library(multcomp)
summary(glht(m, linfct = c("conditiontreat + conditiontreat:gendermale = 0")))

The last part yields:

Simultaneous Tests for General Linear Hypotheses

Fit: glm(formula = dv ~ condition + gender + education + condition:gender, 
    family = "binomial")

Linear Hypotheses:
                                                Estimate Std. Error z value Pr(>|z|)
conditiontreat + conditiontreat:gendermale == 0  -0.1838     0.1290  -1.425    0.154
(Adjusted p values reported -- single-step method)

This shows that while the effect of treatment on the log odds is negative for males, it is not statistically distinguishable from zero.

Here's the same calculation from first principles:

> (te <- m$coefficients["conditiontreat"] + m$coefficients["conditiontreat:gendermale"])        
conditiontreat 
    -0.1838191 
> (se_te <- sqrt(vcov(m)["conditiontreat","conditiontreat"] 
+               + vcov(m)["conditiontreat:gendermale","conditiontreat:gendermale"]
+               + 2*vcov(m)["conditiontreat","conditiontreat:gendermale"]))
[1] 0.1290362
> (pval <- pnorm(te/se_te)*2)
conditiontreat 
      0.154286

edited Oct 02 '20 at 08:14

answered Oct 02 '20 at 01:49

dimitriy

31,081
5
63
138

Thank you, Dimitriy! Three things: (1) Apologies! There was a typo in the p-value calculation. The code should now replicate. (2) When I run `summary(glht(m, linfct = c("conditiontreat + conditiontreat:gendermale = 0")))` I don’t get an estimate of `-0.1838`. Instead I get the numbers I have in my manual calculations: an estimate of `0.08825` with an SE of `0.12687` and p-value of `0.487`. (3) Are you sure it should be -2*Cov()? From the Wikipedia page it looks like it should be +2 for Var(aX + bY). And the +2 calculation seems to yield the ‘correct’ SE, as given by (2) above. – kh_one Oct 02 '20 at 07:31
I don't see any edits to the data simulation part, so I still can't replicate your estimates/coefficients. The change to the manual p-value calculation does not alter that fact and our numbers won't line up without that fix. In your case, $b<1$, so the $Var(aX-bY)$ formula applies. Could certainly be that we are missing something. – dimitriy Oct 02 '20 at 07:47
Can't figure out why it's not replicating. Have added full code block in original post, which should replicate on copy+paste, and yield a positive estimate. – kh_one Oct 02 '20 at 07:53
1

I think you are right about the plus. I made a silly algebra mistake and have edited that out. I still can't get your numbers, but the calculations also work out for me. See the edit above. Not sure what's going wrong with R, but I do get the same estimates in Stata as in R on my end. – dimitriy Oct 02 '20 at 08:18
1

Thanks, Dimitriy. Very strange that we're not getting the same numbers. The only difference is that I'm not running `write.dta(data, "~/Desktop/data.dta")`. But at least we're both getting the `glht` and manual method to give the same results on our respective ends, which seems to confirm that the way I was calculating the SE and p-value in the original post seems to have been the correct way of doing it. – kh_one Oct 02 '20 at 08:54

Standard errors and p-values for interaction effects with GLM

1 Answers1

Linked