In the context of feature selection it is common to recode categorical variables with more than 2 categories into dummies. Selection methods such as elastic nets or lasso regression select the best predictors, whereby it is possible that only some dummies of each categorical variable are selected. I am wondering, if there can appear some problems due to this procedure. I found some comments about the topic on Quora and a tutorial, stating that the procedure should be used carefully, but that there are no general problems. However, I was not able to find any detailed literature or any educated guidelines, which could be followed.
Question: Can there appear any problems, if not all dummies of a categorical variable are selected for a model?
For example, I could imagine that the automatic selection relies on the order of the categories and the resulting reference category. Let's say there is a variable with categories A, B, and C. A dummy recoding into dummyB and dummyC would probably result in different variable selections compared to a dummy recoding into dummyA and dummyB.
Any advice or literature is highly appreciated!
UPDATE:
Based on Ben's comment I found some literature about a comparison of the lasso and the group lasso, which addresses my question:
http://pages.stat.wisc.edu/~myuan/papers/glasso.final.pdf
http://people.ee.duke.edu/~lcarin/lukas-sara-peter.pdf
However, based on this literature 2 further questions appeared:
1) It seems like the normal lasso is still used regularly, whereby the group lasso doesn't appear that often in current literature. Is there a specific reason for that?
2) When I have categorical variables with many categories, isn't it a problem, if I select the whole categorical variable? Or in other words, is it sometimes advantageous to use the lasso instead of the group lasso?