Understanding coefficients in summary output of logistic regression in R

Question

This question is about understanding the logistic regression output using R. Here is my sample data frame:

    Drugpairs             AdverseEvent  Y    N
1   Rebetol + Pegintron       Nausea   29 1006
2   Rebetol + Pegintron      Anaemia   21 1014
3   Rebetol + Pegintron     Vomiting   14 1021
4   Ribavirin + Pegasys       Nausea    5  238
5   Ribavirin + Pegasys      Anaemia   12  231
6   Ribavirin + Pegasys     Vomiting    1  242
7 Ribavirin + Pegintron       Nausea   15  479
8 Ribavirin + Pegintron      Anaemia    7  487
9 Ribavirin + Pegintron     Vomiting    9  485

This basically describes the number of times a particular drug pair has caused a medically adverse event. (Y=yes, N=no). I ran a logistic regression on this dataset in R using the following commands:

mod.form    = "cbind(Y,N) ~ Drugpairs * AdverseEvent"
glmhepa.out = glm(mod.form, family=binomial(logit), data=hepatitis.df)

The summary output was as follows (only showing the coefficients table):

                                                      Estimate Std. Error z value
(Intercept)                                          -3.8771     0.2205 -17.586
DrugpairsRibavirin + Pegasys                          0.9196     0.3691   2.491
DrugpairsRibavirin + Pegintron                       -0.3652     0.4399  -0.830
AdverseEventNausea                                    0.3307     0.2900   1.140
AdverseEventVomiting                                 -0.4123     0.3479  -1.185
DrugpairsRibavirin + Pegasys:AdverseEventNausea      -1.2360     0.6131  -2.016
DrugpairsRibavirin + Pegintron:AdverseEventNausea     0.4480     0.5457   0.821
DrugpairsRibavirin + Pegasys:AdverseEventVomiting    -2.1191     1.1013  -1.924
DrugpairsRibavirin + Pegintron:AdverseEventVomiting   0.6678     0.6157   1.085

I understand that the coefficients give probabilistic odds. I am curious however, as to why there are no coefficients for the AdverseEventAnaemea and also why is there no coefficient for any combination of the drugs and the adverse event anaemea? (The last 4 rows are the combination effects of drugs and adverse events)

score 2 · Answer 1 · edited Apr 13 '17 at 12:44

This is a frequently asked question. You have an interaction of factors. Having an interaction means that all combinations of the levels of each factor are represented, so there are 3*3=9 combinations. If you count the rows in the summary output, you will see that there are 9 rows of output. Factors are represented by default by dummy codes formed using reference level coding. The first level (DrugpairsRebetol + Pegintron:AdverseEventAnaemea in your case) is taken as the reference level, so that level is indexed by the (Intercept) in your model / in the summary output. When DrugpairsRebetol + Pegintron is held constant, the other two AdverseEvents are listed separately; when AdverseEventAnaemea is held constant, the other two Drugpairs are listed separately. The remaining 4 combinations are indicated individually in the last 4 lines of the output. To understand these ideas more fully, it may help you to read these:

Interactions: Interaction in generalized linear model
Factors: Regression based for example on days of week

Understanding coefficients in summary output of logistic regression in R

1 Answers1