I am trying to do a regression analysis in which one of the predictors is a categorical variable that has three categories, say, A, B and C. These categories CANNOT be put in a specific order as e.g. A > B > C. But instead they just fall into different categories, as e.g. race = hispanic, race = asian or race = white etc.
Now, I found two different ways (by one source) to code this kind of data: system #1 Regression with Categorical Predictors and system #2 Additional Coding Systems ...
System 1 suggests to create three variables e.g. race1 race2 and race3 which will be coded 1 or 0 depending on which category the observations fall into. E.g. if the observation has race "hispanic", race1 will be coded 1 and race2 & race3 will be coded 0. Likewise for "asian" race1 & race3 will be coded 0 and race2 will be coded 1 etc. In the actual model, only two variables (e.g. only race2 and race3) will be used and race1 serves as the reference.
System 2 suggests to do it a little bit differently. You would create k-1 additional variables where k is the number of different levels. Thus you would create two variables race1 and race2. Now instead of using 1 or 0, -1/k, if the observation DOES NOT fall into the respective category, and (k-1)/k, if the observation DOES fall into the respective category, are used. In my example: if the observation has the race "hispanic", both race1 and race2 would be coded, say, -1/3, because hispanic observations should serve as the reference. Likewise for asian observations race1 would be coded 2/3 and race2 -1/3. Obviously, here also only two variables will actually be used in the model.
I just ran two regression analyses using the different coding schemes and got the exact same results, except for the constant, which is slightly lower for system #2. So what is the difference between the two schemes? Why should I prefer one over the other (except for the reason that system 1 is way more straight forward)?
EDIT: actually the two links I provided do not work currently (at least for me). That's why I tried my best to explain the two systems :)