2

In my formula I have several terms with the categorical variable color and color's interaction with other variables. Color can be either white or red. When looking at the summary of my model, color is replaced by colorwhite with no mention of colorred. Can someone explain the mechanics behind R only showing one level of my categorical variable in the summary?

example:

>summary(lm(foo~bar+color+color:bar)) coefficients bar colorwhite colorwhite:bar

Info5ek
  • 1,051
  • 3
  • 11
  • 21
  • 6
    I might be missing something, but colorred is the reference group. This is fundamental concept in categorical variables and should be covered by wikipedia (dummy variables) or http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm – charles Feb 25 '14 at 03:17
  • 3
    This question is answered (although not directly asked, I believe) in many threads which can be found by searching our site for [dummy coding](http://stats.stackexchange.com/search?q=dummy+coding+[r]+[categorical]). – whuber Feb 25 '14 at 17:27

1 Answers1

4

Update: I just see Charles's reply above. I think the documents he pointed to are very helpful. Thanks Patrick Coulombe for discussion.

If your color variable has only two possible values 'white' and 'red', you will have only coefficients for colorwhite and colorwhite:bar.

The point is that you have intercept. And the base case is color = red.

Thecoefficient for colorwhite actually reflect the effect size of white, being compared to that of red, given the bar as 0.

Thecoefficient for colorwhite:bar actually reflect the effect size of the combination of (interaction between) white and a unit of bar value, being compared to that of the combination of (interaction between) red and a unit of bar value, given the same bar value.

Please check this wikipedia page:

"Always present effect sizes for primary outcomes...If the units of measurement are meaningful on a practical level (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized measure (regression coefficient or mean difference) to a standardized measure (r or d).

— L. Wilkinson and APA Task Force on Statistical Inference (1999, p. 599)

...

The term effect size can refer to a standardized measures of effect (such as r, Cohen's d, and odds ratio), or to an unstandardized measure (e.g., the raw difference between group means and unstandardized regression coefficients). "

And also this page: "Remember that regression analysis is used to produce an equation that will predict a dependent variable using one or more independent variables. This equation has the form

Y = b1X1 + b2X2 + ... + A where Y is the dependent variable you are trying to predict, X1, X2 and so on are the independent variables you are using to predict it, b1, b2 and so on are the coefficients or multipliers that describe the size of the effect the independent variables are having on your dependent variable Y, and A is the value Y is predicted to have when all the independent variables are equal to zero.

...

In simple or multiple linear regression, the size of the coefficient for each independent variable gives you the size of the effect that variable is having on your dependent variable, and the sign on the coefficient (positive or negative) gives you the direction of the effect. In regression with a single independent variable, the coefficient tells you how much the dependent variable is expected to increase (if the coefficient is positive) or decrease (if the coefficient is negative) when that independent variable increases by one. In regression with multiple independent variables, the coefficient tells you how much the dependent variable is expected to increase when that independent variable increases by one, holding all the other independent variables constant. Remember to keep in mind the units which your variables are measured in. "

Peter H
  • 179
  • 3
  • The coefficients are not effect sizes – Patrick Coulombe Feb 25 '14 at 03:39
  • Also, the coefficient for `colorwhite` is the predicted difference in the score on the DV between white and red *when bar = 0*. – Patrick Coulombe Feb 25 '14 at 03:48
  • Please check this [wikipedia page](http://en.wikipedia.org/wiki/Effect_size): "The term effect size can refer to a standardized measures of effect (such as r, Cohen's d, and odds ratio), or to an unstandardized measure (e.g., the raw difference between group means and unstandardized regression coefficients). " I think you are emphasizing the effect size as standardized measures of effect, namely, the "effect size" of regression. What I said is reasonable as an unstandardized measure. – Peter H Feb 25 '14 at 03:56
  • And please also check this [page](http://dss.princeton.edu/online_help/analysis/interpreting_regression.htm): Y = b1X1 + b2X2 + ... + A where Y is the dependent variable you are trying to predict, X1, X2 and so on are the independent variables you are using to predict it, b1, b2 and so on are the coefficients or multipliers that describe *the size of the effect* the independent variables are having on your dependent variable Y, and A is the value Y is predicted to have when all the independent variables are equal to zero. – Peter H Feb 25 '14 at 03:59
  • Yes, that seems right. When I hear effect size I certainly don't think of this "unstandardized" kind. Note that my comment on the interpretation of the `colorwhite` coefficient in the presence of the `colorwhite:bar` term still stands. – Patrick Coulombe Feb 25 '14 at 04:00
  • I do not mean what you said is wrong, or does not stand. I mean what I said stands, and we may not say something is not true simply because it is not the thing we have seen/used. – Peter H Feb 25 '14 at 04:02
  • Apologies, perhaps I should have been clearer. You should edit your answer to reflect the fact that the interpretation of the `colorwhite` coefficient applies only when bar=0 (when bar is different from 0, the predicted difference between red and white is NOT the value of the `colorwhite` coefficient) – Patrick Coulombe Feb 25 '14 at 04:07
  • You are right. However, not only bar=0. Actually for any same bar value in those two scenarios. – Peter H Feb 25 '14 at 04:10
  • 1
    This is not right @PeterH. The predicted difference between red and white is $\hat{\beta}_{colorwhite} + \hat{\beta}_{colorwhite*bar} \cdot bar$. Therefore, the predicted difference between red and white is equal to $\hat{\beta}_{colorwhite}$ only when $bar=0$. – Patrick Coulombe Feb 25 '14 at 04:16
  • 1
    @PatrickCoulombe, yes, i think you are right. Please see the update above. – Peter H Feb 25 '14 at 04:43