When to apply target encoding ( before or after target transform)?

Asked Mar 24 '21 at 17:09

Active Mar 25 '21 at 01:45

Viewed 67 times

I am trying to build a linear regression model.

I have some high cardinal categorical features on which I want to apply target encoding. But my target (real-valued) variable distribution is highly right skewed, so I will apply some transform to get rid of skew.

Which of the following approach is sensible :

I should transform my target variable first and then apply target encoding on categorical feature based on transformed target.
I should apply target encoding on categorical feature based on original target. After that I should apply skew removal on my target variable.

Thanks in advance..

edited Mar 25 '21 at 01:45

asked Mar 24 '21 at 17:09

Sandeep Maurya

What do you mean by transforming a skewed categorical distribution? – Dave Mar 24 '21 at 17:14
My target variable is not categorical. It is real valued. – Sandeep Maurya Mar 24 '21 at 17:17
1

Depending on what you're doing, the transformation might not be so important; we like normal residuals, not a normal pooled distribution of the response variable. However, how does the category to which an observation belong depend on the transformation? – Dave Mar 24 '21 at 17:20
I wish to train a linear regression model using this. As I learnt, if input features as well as target variable has gaussian-like distribution then Linear models tends to perform better. – Sandeep Maurya Mar 24 '21 at 17:27
@SandeepMaurya You're likely looking at the histogram of the outcome, which is the marginal distribution of the outcome. The assumption of normality is about the *conditional* distribution. See my answer [here](https://stats.stackexchange.com/questions/476424/what-are-the-worst-commonly-adopted-ideas-principles-in-statistics/476435#476435) and the referenced answer therein. – Demetri Pananos Mar 25 '21 at 02:32

When to apply target encoding ( before or after target transform)?

0 Answers0