I've just learnt about dummy variables. Say this is my data:
Location | Nest |
---|---|
XXX | Yes |
XXX | No |
ZZZ | Yes |
YYY | Yes |
YYY | No |
And I want to do multicolinearity tests/logistic regression in RStudio, so I don't want the dependent variable (Nest) to be in this text format. What is the difference between changing all "no" to 0, and all "yes" to 1 versus having the below output (which, as I understand, is the 'dummy' encoded version of the above data).
Location | Nest | No nest |
---|---|---|
XXX | 1 | 0 |
XXX | 0 | 1 |
YYY | 1 | 0 |
Moreover, if I have say 15 categories that I want to analyse, can I just label them #1-15, or do I need to have 14 columns (for the k-1 I suppose) to make the categories into dummy variables?