What happens in a regression setting if you code n dummy variables for a categorical variable with n categories?

Question

I understand the usual procedure to code categorical variables is to convert n categories into n-1 coded variables. For example, the categorical variable colour with levels red/green/blue could be coded as

         V1  V2 
red   -> 1   0
blue  -> 0   1
green -> 0   0

which in a regression setting means that the effect of green on the response is factored into the intercept.

I know that if we created an additional binary variable V3 such that green is coded

         V1  V2  V3 
red   -> 1   0   0
blue  -> 0   1   0
green -> 0   0   1

then we should fit a regression model with no intercept.

What happens if I take the latter coding (i.e. 3 variables V1, V2, V3 for 3 levels of colour) and fit a regression model with an intercept? I can't figure out why we shouldn't do this.

Because the three dummies add to the column of 1's for the intercept, making those four effects perfectly multicollinear. It's like trying to balance a sheet of plywood on a picket fence - there's not enough "information" in the line of points to keep it steady - the part along the fence is well-determined, but either side it flips up and down. To avoid this indeterminacy, you either need to eliminate a dummy or the intercept term. [This will be a duplicate. Hold on and I'll have a look.] — Glen_b, Oct 28 '15 at 03:52
thanks, I found lots of posts about how to code dummy variables, but none explaining what happens if you add in an extra one. — Alex, Oct 28 '15 at 03:55
Does [this one](http://stats.stackexchange.com/questions/30525/how-to-handle-multicollinearity-in-a-linear-regression-with-all-dummy-variables) (the reference to R doesn't alter the explanation) get at what you want? Also see some discussion of multicollinearity [here](http://stats.stackexchange.com/questions/70699/qualitative-variable-coding-in-regression-leads-to-singularities/70700#70700). If you need something different from those, please clarify — Glen_b, Oct 28 '15 at 04:02
Thanks, I think http://stats.stackexchange.com/questions/70699/qualitative-variable-coding-in-regression-leads-to-singularities/70700#70700 answers my question, I will just have to work out what it is saying. — Alex, Oct 28 '15 at 04:09
I'll close this but if there's an outstanding issue that's not resolved at that post, modify your question here (with a link to that one if it helps) and flag to ask for it to be re-opened. — Glen_b, Oct 28 '15 at 04:22

What happens in a regression setting if you code n dummy variables for a categorical variable with n categories?

0 Answers0