I performed a negative binomial regression and here is my output (variable names changed from my original output):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.041e+00 8.978e-02 45.006 < 2e-16 ***
langB -1.143e-01 1.181e-01 -0.968 0.333137
langC -6.581e-02 1.080e-01 -0.609 0.542311
langD 5.237e-01 9.540e-02 5.489 4.03e-08 ***
langE -1.603e-01 1.076e-01 -1.490 0.136289
langF 9.649e-02 1.042e-01 0.926 0.354362
langG 1.775e-01 1.043e-01 1.702 0.088696 .
num_users.m 5.675e-02 7.949e-03 7.139 9.39e-13 ***
num_attributes.m 3.030e-04 9.860e-05 3.073 0.002116 **
num_lines.m 7.902e-05 4.538e-05 1.741 0.081679 .
num_distractions.m 1.892e-02 3.182e-02 0.595 0.552041
type_freq.m 1.613e-06 4.183e-07 3.855 0.000116 ***
prop_attended.m 1.222e+00 4.645e-02 26.299 < 2e-16 ***
I am going through the example in http://www.ats.ucla.edu/stat/r/dae/nbreg.htm to understand if I'm interpreting it correctly. As I understand, generally we interpret the effect size of one predictor holding other variables constant. For my case, taking num_users
as an example, I would say for a unit increase in the number of users, the expected log count of my response variable increases by 0.06, holding all else constant or at their means. I'm however wondering what constant means in case of a categorical variable like lang
. Would it be langA
here, for it is chosen to be the reference? But when I did
records$lang <- relevel(records$lang, "C")
my other coefficients still didn't seem to change. So does it then mean that for the two instances I compare, I should hold the language constant, but it doesn't really matter what that constant language is?
I've read the wonderful explanation given by @gung as an answer to What does "all else equal" mean in multiple regression? but I find no mention of categorical variables there. Could somebody clarify this please?