0

I am struggling mightily trying to figure out how to calculate the SE related to an odds ratio for a linear combination of two factors in a logistic regression model with an interaction term included. Consider the following....

set.seed(123)
height <- rbinom(1000,1,0.5)
weight <- rbinom(1000,1,0.5)
survived <- height*0.75 + weight*0.35 + height*weight*0.5 + rnorm(1000,0,1)

dat <- data.frame(Height=factor(ifelse(height==1,"Tall","Short"),levels=c("Short","Tall")),
                  Weight=factor(ifelse(weight==1,"Big","Small"),levels=c("Small","Big")),
                  Survived=ifelse(survived>=median(survived),1,0))

mod1 <- glm(Survived~Height*Weight,data=dat,family="binomial")                   
mod2 <- glm(Survived~interaction(Height,Weight),data=dat,family="binomial")                   

Here, the height (tall/short) and weight (big/small) of an animal determines its survival, and there is an interaction between those two factors. mod1 is a simple logistic model using the cross-product notation, whereas mod2 fits a derived variable that specifies each of the possible combinations of height/weight, with short/small as the reference.

MOD1

Call:
glm(formula = Survived ~ Height * Weight, family = "binomial", 
    data = dat)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.83369  -0.95247  -0.07909   1.06245   1.60944  

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)           -0.9751     0.1444  -6.751 1.47e-11 ***
HeightTall             1.2517     0.1909   6.558 5.45e-11 ***
WeightBig              0.4199     0.1926   2.180  0.02922 *  
HeightTall:WeightBig   0.7787     0.2850   2.732  0.00629 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1386.3  on 999  degrees of freedom
Residual deviance: 1212.3  on 996  degrees of freedom
AIC: 1220.3

Number of Fisher Scoring iterations: 4

MOD2

Call:
glm(formula = Survived ~ interaction(Height, Weight), family = "binomial", 
    data = dat)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.83369  -0.95247  -0.07909   1.06245   1.60944  

Coefficients:
                                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)                            -0.9751     0.1444  -6.751 1.47e-11 ***
interaction(Height, Weight)Tall.Small   1.2517     0.1909   6.558 5.45e-11 ***
interaction(Height, Weight)Short.Big    0.4199     0.1926   2.180   0.0292 *  
interaction(Height, Weight)Tall.Big     2.4504     0.2224  11.020  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1386.3  on 999  degrees of freedom
Residual deviance: 1212.3  on 996  degrees of freedom
AIC: 1220.3

Number of Fisher Scoring iterations: 4

I would like to know what the odds ratio of survival would be for a tall/big animal vs. small/short. I can easily follow the math in mod1 to see that it is exp(1.2517+0.4199+0.7787) or 11.59. mod2 tells me that up front, along with the unexponentiated SE (0.2224), but how do I get to the SE of 0.2224 from mod1? I've followed posts such as Standard errors and p-values for interaction effects with GLM, but I can't seem to get my calculated SE to match that from mod2.

Many thanks ahead of time!

AtMac
  • 67
  • 5

1 Answers1

0

You correctly identified that the way to compute the odds ratio of interest is to add up a specific combination of coefficients (and then exponentiate). That is, you computed $\theta = \beta_1 + \beta_2 + \beta_3$.

If we write this in matrix notation, with $\mathbf{b} = \begin{bmatrix}\beta_0 & \beta_1 &\beta_2 & \beta_3 \end{bmatrix}'$, we can represent $\theta$ as $\mathbf{ab}$, where $\mathbf{a} = \begin{bmatrix}0 & 1 & 1 & 1 \end{bmatrix}$.

To compute the variance of $\theta$, we compute $\mathbf{a}\mathbf{S}\mathbf{a}'$, where $\mathbf{S}$ is the estimated covariance matrix of the coefficients, which you would retrieve using vcov(mod1). You then take the square root of this value to get the standard error of $\theta$, which is your quantity of interest.

In R, you would run the following:

a <- matrix(c(0, 1, 1, 1), nrow = 1)
b <- matrix(coef(mod1), ncol = 1)
S <- vcov(mod1)

a %*% b #logOR
sqrt(a %*% S %*% t(a)) #SE of logOR

That should give you the same standard error as computed using mod2.

Noah
  • 20,638
  • 2
  • 20
  • 58
  • AMAZING! The math is a little over my head, but I will study it closely. Thank you so much for showing me the solution! – AtMac Nov 03 '20 at 13:08
  • Glad it was helpful. I misread your code; `mod1` and `mod2` should refer to the output of `glm()`, not `summary()`. – Noah Nov 03 '20 at 16:40