0

I understand the usual procedure to code categorical variables is to convert n categories into n-1 coded variables. For example, the categorical variable colour with levels red/green/blue could be coded as

         V1  V2 
red   -> 1   0
blue  -> 0   1
green -> 0   0

which in a regression setting means that the effect of green on the response is factored into the intercept.

I know that if we created an additional binary variable V3 such that green is coded

         V1  V2  V3 
red   -> 1   0   0
blue  -> 0   1   0
green -> 0   0   1

then we should fit a regression model with no intercept.

What happens if I take the latter coding (i.e. 3 variables V1, V2, V3 for 3 levels of colour) and fit a regression model with an intercept? I can't figure out why we shouldn't do this.

ttnphns
  • 51,648
  • 40
  • 253
  • 462
Alex
  • 3,728
  • 3
  • 25
  • 46
  • Because the three dummies add to the column of 1's for the intercept, making those four effects perfectly multicollinear. It's like trying to balance a sheet of plywood on a picket fence - there's not enough "information" in the line of points to keep it steady - the part along the fence is well-determined, but either side it flips up and down. To avoid this indeterminacy, you either need to eliminate a dummy or the intercept term. [This will be a duplicate. Hold on and I'll have a look.] – Glen_b Oct 28 '15 at 03:52
  • thanks, I found lots of posts about how to code dummy variables, but none explaining what happens if you add in an extra one. – Alex Oct 28 '15 at 03:55
  • Does [this one](http://stats.stackexchange.com/questions/30525/how-to-handle-multicollinearity-in-a-linear-regression-with-all-dummy-variables) (the reference to R doesn't alter the explanation) get at what you want? Also see some discussion of multicollinearity [here](http://stats.stackexchange.com/questions/70699/qualitative-variable-coding-in-regression-leads-to-singularities/70700#70700). If you need something different from those, please clarify – Glen_b Oct 28 '15 at 04:02
  • Thanks, I think http://stats.stackexchange.com/questions/70699/qualitative-variable-coding-in-regression-leads-to-singularities/70700#70700 answers my question, I will just have to work out what it is saying. – Alex Oct 28 '15 at 04:09
  • I'll close this but if there's an outstanding issue that's not resolved at that post, modify your question here (with a link to that one if it helps) and flag to ask for it to be re-opened. – Glen_b Oct 28 '15 at 04:22

0 Answers0