Calculating CI and SD for individual regression lines from a multiple factor glm

Question

We have 2 correlated variables and a lot of binomial factors (around 200), here illustrated with just $f1$ and $f2$:

x <- rnorm(100)
y <- rnorm(100)
f1 <- rbinom(100, 1, 0.5)
f2 <- rbinom(100, 1, 0.5)

Which gives four possible groups: A $(f1=1,f2=1)$, B $(f1=0,f2=1)$, C $(f1=1,f2=0)$, and D $(f1=0,f2=0)$.

We then run the model

> glm(y ~ x * f1 + x * f2)

Call:
glm(formula = y ~ x * f1 + x * f2)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.72028  -0.58501   0.03167   0.60097   1.86332  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.03188    0.17388  -0.183   0.8549  
x            0.08105    0.20540   0.395   0.6940  
f1           0.26823    0.19309   1.389   0.1681  
f2          -0.34568    0.19488  -1.774   0.0793 .
x:f1         0.10301    0.20183   0.510   0.6110  
x:f2        -0.25875    0.20828  -1.242   0.2172  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.8906953)

    Null deviance: 88.754  on 99  degrees of freedom
Residual deviance: 83.725  on 94  degrees of freedom
AIC: 280.02

Number of Fisher Scoring iterations: 2

We can simplify this output and make a regression ($y = a + b \times x$) for each group by doing:

$a_A = (-0.03188) + (0.26823) + (-0.34568)=-0.10932806$ $b_A = (0.08105) + (0.10301) + (-0.25875)=-0.07468630$

$a_B = (-0.03188) + (-0.34568)=-0.37755949$ $b_B = (0.08105) + (-0.25875)=-0.17769345$

And the same for the C and D groups. My question is: How do I calculate a standard deviation or confidence intervals for the individual group slopes. Is it additive like the estimates or is it the means or something else? Thank you for any help.

I believe `linearHypothesis` in the `car()` package can do this for you. These types of tests on coefficients are commonly called "linear restrictions". — Affine, Dec 11 '14 at 17:10
I can't see how that is the way to go. LinearHypothesis from the car package seems to test a hypothesised slope or intercept against a fitted model. The slopes and intercept I have is actually just taken from the model. The thing I would like would be how to summarise the standard error across the groups. — Rasmus Ø. Pedersen, Dec 11 '14 at 17:26
Yeah, you're correct that doesn't present a SE, just a hypothesis test against zero (usually that's sufficient), eg/ `linear.hypothesis(lm(y ~ x*f1 + x*f2), "x + x:f1 + x:f2 = 0")` would be a test of your $b_A$. — Affine, Dec 11 '14 at 17:46

Affine · Answer 1 · 2014-12-11T19:06:59.120

These are commonly known as linear restrictions, and the linear.hypothesis() function in the car package runs hypothesis tests on them.

linear.hypothesis(lm(y ~ x*f1 + x*f2), "x + x:f1 + x:f2 = 0") would be a test of your $b_A$.

However, this does not report the standard error, and given that you are interested in obtaining the standard errors, you'll need to do the math manually.

If we have the var/covariance matrix of the coefficients $V$ (b x b), the estimated coefficients $\beta$ (b x 1), and our desired combination $h$ (1 x b)

m = lm(y ~ x * f1 + x * f2)
V = vcov(m)
B = m$coef
h = as.matrix(cbind(0,1,0,0,1,1))

The general form is that $(h\beta)'(hVh')^{-1}(hB)$ is distributed $F(1,d)$ with $d$ being the model degrees of freedom. This is the F-test that linear.hypothesis is running. The standard error would be $\sqrt{hVh'}$.

Note that since $F(1,d) = t^2(d)$, we could also run a t-test with $h\beta/\sqrt{hVh'}$

se = sqrt( h %*% V %*% t(h) )
estimate = h %*% B
2*(1-pt(estimate/se,df=100-6))

Calculating CI and SD for individual regression lines from a multiple factor glm

1 Answers1

Linked