I've read that mean encoding is useful for classification tasks with high dimensional categorical data.
My question:
What kinds of encodings are effective for high dimensional categorical data in linear regression tasks? For example, can mean encoding be adapted?
Some possible applications:
- Assigning credit limit using job title (we need to encode job title)
- Predicting wait time at a nightclub from shoe type (we need to encode shoe type)