25

Machine learning (ML) uses linear and logistic regression techniques heavily. It also relies on feature engineering techniques (feature transform, kernel , etc).

Why is nothing about variable transformation (e.g.power transformation) mentioned in ML? (For example, I never hear about taking root or log to features, they usually just use polynomials or RBFs.) Likewise, why don't ML experts care about feature transformations for the dependent variable? (For example, I never hear about taking the log transformation of y; they just don't transform y.)

Edits: Maybe the question is not definitely, my really question is "is power transformation to variables not important in ML?"

  • 4
    I'd like to know why this was downvoted; it's actually an interesting question. – shadowtalker Jan 10 '15 at 15:37
  • 1
    I think most people would have taken a linear regression course prior to their first ML course. Surely, the stock LR course would contain a chapter on these things (transformations). Btw, I didn't downvote the question. – user603 Jan 10 '15 at 18:02

3 Answers3

13

The book Applied Predictive Modeling by Kuhn and Johnson is a highly-regarded practical machine learning book with a large section on variable transformation including Box-Cox. The authors claim that many machine learning algorithms work better if the features have symmetric and unimodal distributions. Transforming the features like this is an important part of "feature engineering".

Flounderer
  • 9,575
  • 1
  • 32
  • 43
8

Well from my own perspective, quite often I am interested in the predictive distribution of the response variable, rather than just the conditional mean, and in that case it is better to use a likelihood that more correctly represents the target distribution. For instance, I like to use kernelised linear models rather than (say) support vector regression, because I can use a Poisson likelihood if I want to. As a lot of machine learning people are Bayesians, I suspect that using a different likelihood will seem more elegant than transformations (choosing an appropriate likelihood is generally the first step).

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
0

Here is my afterward thoughts.

I think it's because ML is largely deal with classification, and classification is no need for transforming y (y is categorical). ML usually deal with large independent variables (e.g. thousands in NLP ) and logistic regression doesn't require normality; I think that's why they don't use Box-Cox power transformation due to speed consideration. (note: I'm not familiar to power transformation.)

  • I thought that this answer to your own question seemed odd, until I reread your post and saw that it was about y, not x. So, yes, for classification, I would agree that transforming y does not make much sense. Transforming x can still help a lot of classification methods, from logistic regression to deep learning. – Jared Becksfort Jan 20 '22 at 17:29