0

I'm developing a linear regression model that contains multiple categorical explanatory variables (e.g., cities, marital status), including other binary and continous variables. The output is 0/1 variable. Later, I will be run through backwards elimination, to pick the best model. My thought was to transform the categorical variables into dummies.

My question is whether I can include all the dummy variables -1 in each category (to avoid the dummy trap). Is this the right way to do it?

Or can I only inlcude 1 of the dummy variables (e.g. New York and Married) from the categorical variable, which I know have significantly high correlation with the output compared to other cities and marital status?

  • Does this help https://stats.stackexchange.com/questions/388049/one-hot-encoding-gives-untractable-amount-of-classes/388051#388051 ? – Tim May 15 '21 at 10:04
  • Thank you, @Tim. This unfortunately does not seem to be the answer I'm looking for. I'm wondering if I have multiple cateogrical variables in a model, how can I interperate this in a regression, and also avoid the dummy trap? – visu_hello May 15 '21 at 13:28

0 Answers0