10

I have a question about the interpretation of the coefficients of an interaction between continuous and categorical variable. here is my model:

model_glm3=glm(cog~lg_hag+race+pdg+sex+as.factor(educa)+(lg_hag:as.factor(educa)), 
               data=base_708)

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)               21.4836     2.0698  10.380  < 2e-16 ***
lg_hag                     8.5691     3.7688   2.274  0.02334 *  
raceblack                 -8.4715     1.7482  -4.846 1.61e-06 ***
racemexican               -3.0483     1.7073  -1.785  0.07469 .  
racemulti/other           -4.6002     2.3098  -1.992  0.04687 *  
pdg                        2.8038     0.4268   6.570 1.10e-10 ***
sexfemale                  4.5691     1.1203   4.078 5.15e-05 ***
as.factor(educa)2         13.8266     2.6362   5.245 2.17e-07 ***
as.factor(educa)3         21.7913     2.4424   8.922  < 2e-16 ***
as.factor(educa)4         19.0179     2.5219   7.541 1.74e-13 ***
as.factor(educa)5         23.7470     2.7406   8.665  < 2e-16 ***
lg_hag:as.factor(educa)2 -21.2224     6.5904  -3.220  0.00135 ** 
lg_hag:as.factor(educa)3 -19.8083     6.1255  -3.234  0.00129 ** 
lg_hag:as.factor(educa)4  -8.5502     6.6018  -1.295  0.19577    
lg_hag:as.factor(educa)5 -17.2230     6.3711  -2.703  0.00706 ***

lets say the equation of the model is:

E[cog] = a + b1(lg_hag) + b2(educa2*lg_hag) + b3(educa3*lg_hag) + b4(educa4*lg_hag) + b5(pdg, centered) + other covars, where

b1 = difference in cog  with higher lg_hag among lowest education (coded as 1)
b1 + b2 = difference in cog with higher lg_hag among middle education (coded as 2)
b1 + b3 = difference in cog with higher lg_hag among high education (coded as 3)
b1 + b3 = difference in cog with higher lg_hag among very high education (coded as 4)
b5 = difference in cog with each unit increase in pdg

My question is: if my interpretation is right, how to construct confidence intervals for each effect estimate of interactions (e.g: b1+b2) from the confidence intervals of b1 and b2.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
You Safe
  • 183
  • 1
  • 2
  • 9
  • not very familiar with how to do that in R. suppose in sas you can get the result by statement "estimate", refer to http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glm_sect013.htm – boomean Oct 25 '12 at 04:46

1 Answers1

9

Your interpretation of the model’s coefficients is not completely accurate. Let me first summarize the terms of the model.

Categorial variables (factors): $race$, $sex$, and $educa$

The factor race has four levels: $race = \{white, black, mexican, multi/other\}$.

The factor sexhas two levels: $sex = \{male, female\}$.

The factor educa has five levels: $educa = \{1, 2, 3, 4, 5\}$.

By default, R uses treatment contrasts for categorical variables. In these contrasts, the first value of the factor is used a reference level and the remaining values are tested against the reference. The maximum number of contrasts for a categorical variable equals the number of levels minus one.

The contrasts for race allow testing the following differences: $race = black\ vs. race = white$, $race = mexican\ vs. race = white$, and $race = multi/other\ vs. race = white$.

For the factor $educa$, the reference level is $1$, the pattern of contrasts is analogous. These effects can be interpreted as the difference in the dependent variable. In your example, the mean value of cog is $13.8266$ units higher for $educa = 2$ compared to $educa = 1$ (as.factor(educa)2).

One important note: If treatment contrasts for a categorical variable are present in a model, the estimation of further effects is based on the reference level of the categorical variable if interactions between further effects and the categorical variable are included too. If the variable is not part of an interaction, its coefficient corresponds to the average of the the individual slopes of subsets of this varible along all remaining categorical variables. The effects of $race$ and $educa$ correspond to average effects with respect to the factor levels of the other variables. To test overall effects of $race$, you would need to leave $educa$ and $sex$ out of the model.

Numeric variables: $lg\_hag$ and $pdg$

Both lg_hag and pdg are numeric variables hence the coefficients represent the change in the dependent variable associated with an increase of $1$ in the predictor.

In principle, the interpretation of these effects is straightforward. But note that if interations are present, the estimation of the coefficients is based on the references categories of the factors (if treatment contrasts are employed). Since $pdg$ is not part of an interaction, its coefficient corrsespods to the average slope of the variable with respect. The variable $lg\_hag$ is also part of an interaction with $educa$. Therefore, its effect holds for $educa = 1$, the base level.; it is not a test of an overall influence of the numeric variable $lg\_hag$ irrespective of the levels of the factors.

Interactions between categorical and numeric variables: $lg\_hag \times educa$

The model does not only include main effects but also interactions between the numeric variable $lg\_hag$ and the four contrasts associated with $educa$. These effects can be interpreted as the difference in the slopes of $lg\_hag$ between a certain level of $educa$ and the reference level ($educa = 1$).

For example, the coefficient of lg_hag:as.factor(educa)2 (-21.2224) means that slope of $lg\_hag$ is $21.2224$ units lower for $educa = 2$ compared to $educa = 1$.

Sven Hohenstein
  • 6,285
  • 25
  • 30
  • 39
  • *"These interaction coefficients also hold for `race=white` and `sex=male` only."* Are you sure of this? I ask because neither `race` nor `sex` is in interaction with the `lg_hag×educa` term... I am looking at several texts I don't see this explicitly indicated. – landroni Apr 16 '15 at 10:22
  • 2
    @landroni The slopes are estimated for the point where are all remaining predictors are equal to 0. – Sven Hohenstein Apr 16 '15 at 15:52
  • Yeah, that's my understanding of it too. All other predictors are held constant, meaning that factors are fixed to their baseline level. But therein lies my conundrum: I've looked at several books that seem to mostly gloss over this subtle but far-reaching nuance. Moreover, papers often "control by industry" yet draw conclusions as if the coefficients were unconditional over the full sample, instead of singling out that this is only for the baseline level.. See also: http://stats.stackexchange.com/questions/146665/how-does-the-presence-of-factors-affect-the-interpretation-of-the-other-coeffici – landroni Apr 16 '15 at 19:08
  • 1
    *"If treatment contrasts for a categorical variable are present in a model, the estimation of further effects is based on the reference level of the categorical variable."* After further consideration, I'm not convinced (or I don't follow your argument entirely). You seem to imply that the estimate of beta for e.g. `pdg` depends on the reference level, which is clearly not the case. If I change the reference level of any of the factors (e.g. `sex`), the estimate for `pdg` will NOT change... – landroni Apr 17 '15 at 19:57
  • 1
    @landroni Thanks for pointing out. You are right, this statement is misleading. Actually, it only holds for predictors that are also part of interaction terms with categorical variables. Hence, the estimate of `pdg` does indeed *not* depend on the specification of the contrasts. I will modify the answer accordingly. – Sven Hohenstein Apr 18 '15 at 02:57
  • @landroni However, the effect of a predictor might change if you do not include the categorical variables. It appears the estimate of the coefficient for the continuous variable is the mean of the slopes corresponding to the different levels of the categorical variable. – Sven Hohenstein Apr 18 '15 at 03:04
  • Indeed. If we do remove a given categorical variable from the regression, then the effect of the predictor will change as the beta estimate will no longer be controlled for that factor. Thanks for clarifying all this. BTW, I suspect that *"To test overall effects of race, you would need to leave educa and sex out of the model."* should also be edited away... – landroni Apr 18 '15 at 08:37
  • "Since pdg is not part of an interaction, its coefficient corresponds to the average slope of the variable with respect to the levels of the categorical variables." This is only accurate when there are the same number of cases in each group (while this is common for experimentally-controlled categorical variables, it's unlikely for natural categorical variables like race and education). The more general rule is that it is the effect of $pdg$ estimated for the whole dataset, without respect to the categorical variable(s). – Rose Hartman Mar 26 '17 at 17:15
  • @RoseHartman Agreed! Thanks for pointing out. I removed the corresponding part of the sentence. – Sven Hohenstein Mar 27 '17 at 07:24