I have a query that I perform, I have a dataset that has several categorical features with thousands of levels.
Applying get_dummies would generate a dataset that I could not work with. I would be interested to make selection of the most important levels and the rest of levels of less importance group them. Then i can apply get_dummies.
Do you have any idea how to do this?
It is possible to apply chi squared to the levels of the categorical features instead of to the own features.
I usually use python and scikit learn.