How to use dummy variables for categorical variables in a multiple regression

Question

If I have a categorical variable with three levels (CatXVar) that I recode into two dummy variables (NYXVar and BostonXVar)such that:

YVar ContXVar  CatXVar NYXVar BostonXVar
0.23 10        NY      1      0
0.1  22.3      Boston  0      1
0.52 11.9      London  0      0

and I want to see whether CatXVar affects the significance of any relationship between YVar and ContXVar, should I run two separate regressions of:

Yvar ~ ContXVar + NYXVar

and

Yvar ~ ContXVar + BostonXVar

or should I run the regression as:

Yvar ~ ContXVar + NYXVar + BostonXVar

If you exclude the intercept from your model you can run it using CarXVar as the predictor. — Mike Hunter, Nov 01 '15 at 19:32
I've read that I should convert k-level categorical variables into k-1 binary dummy variables though? — Kaleb, Nov 02 '15 at 08:06
You can remove the intercept by including `-1` in the model specification. You might want to consider reading [this](http://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-lm) before doing that. `Yvar ~ ContXVar + NYXVar + BostonXVar` seems to be the safe bet here. — horseoftheyear, Nov 04 '15 at 14:35
Yeah why not. Added it as answer. Might trigger some useful feedback/insights. — horseoftheyear, Nov 04 '15 at 17:56

score 1 · Accepted Answer · edited Apr 13 '17 at 12:44

You can remove the intercept, as suggested in the comments, by including -1 in the model specification. However, you might want to consider reading this before doing that.

Yvar ~ ContXVar + NYXVar + BostonXVar seems to be the safe bet here.

Running two separate regressions essentially means you're estimating two separate models as you don't account for the New York/Boston effect respectively.

How to use dummy variables for categorical variables in a multiple regression

1 Answers1