When using Logistic Regression, and the categorical variables are one hot encoded, do we always have to drop a variable to avoid the dummy variable trap? If I recall it correctly, I have seen somewhere that if you use regularization, you don't need to drop 1 variable, but I can't find seem the find the article back. I'm a newbie. Many thanks
Asked
Active
Viewed 282 times
1
-
What is "the dummy variable trap"? – alan ocallaghan Sep 17 '20 at 12:50
-
Ah I see, collinearity. Yes, regularisation (L2 in particular) permits collinearity of predictors. However why would you not drop one level? It would make interpretation of the regression coefficients more difficult. – alan ocallaghan Sep 17 '20 at 12:52
-
Say, I have 12 categories in the variable month. Do I just one hot encode and leave it? I'm doing logistic regression model to predict binary classification. – Kyle Sep 17 '20 at 14:09
-
What issue do you have with standard dummy variable encoding, where you include an intercept and 12-1 dummy variables? I understand the question you're asking but without knowing why you're asking it's hard to know how to address it. – alan ocallaghan Sep 17 '20 at 15:18