I ran LASSO with logistic regression to obtain a list of "important" variables. For factor variables, I created one-hot encoded dummy variables using the step_dummy
function in the tidymodels
world.
After running LASSO, I inspected the list of variables that were kept and noticed that some of dummy variables were deemed "unimportant" by LASSO and were thus set to 0. Does it make sense to only keep some of the dummy variables (i.e., the non-zero ones) when running a final logistic regression model? For example, for race, 5 indicator variables were created using one-hot encoding: White, Black, Asian, Hispanic, and Other. LASSO only deemed White and Hispanic important and dropped the other 3. Is it ok to just include White and Hispanic in my logistic regression to make predictions?