I am curious about a possible regression model I want to run. Let's say the model has a (Yes/No) response and several binary independent variables.
The independent binary variables are taken from 1 common variable: Location. Let's say our possible values are: Northwest, Southwest, Midwest, Southeast, and Northeast.
I want to build a regression model that measures the probability of a "Yes" by Location. Let's say I calculated my independent variables by creating 5 (1/0) variables for each of the locations. I understand the interpretation would be different in a model using these variables vs. a model where we use one of the locations as a baseline to compare all other locations against.
My question is: Is there any issue with interpreting the results of the first model with 5 separate Location (1/0) independent variables? Would this type of model raise multicollinearity concerns? Is there any issue with the first model? No continuous variables (response and predictors are all 0s and 1s).
Here's some example code in R that could run these models for reference:
mod1=glm(Response~I(Location),data=data,family="binomial")
mod2=glm(Response~I(NW)+I(SW)+I(MW)+I(SE)+I(NE),data=data,family="binomial")