2

When transforming variables in a linear regression, be it log, square or some form of normalization, should one then do this to every variable in the regression or is it fine to transform just one variable why and why not?

Lamma
  • 475
  • 3
  • 9
  • There is no fixed rule for that. It will depend on the context of application which is not given. Different specifications are different assumptions of functional form. Choosing the wrong functional form is a type of misspecification and can bias estimates. There exists some tests for functional form too guide choice. – Jesper for President Feb 19 '20 at 14:59
  • Please see https://stats.stackexchange.com/search?tab=votes&q=transform%20independent%20variable%20regression%20score%3a10%20is%3aanswer for other discussions of this issue. – whuber Feb 19 '20 at 16:31

2 Answers2

3

There can't be a simple rule on this; it depends on circumstances.

At one extreme, suppose one variable is a $(0, 1)$ indicator (some people say dummy). Then transforming this is either impossible (e.g. the log or logit of such a variable is not defined for either one or two values) or futile (any other pair of values would not usually serve better).

So, a little more generally, it could be in order to leave some predictors as they arrive and to transform some others, while transforming the response or not could be a separate or a related decision.

At another extreme, a functional form such as $$Y = a X_1^{b_1}\ X_2^{b_2}\ X_3^{b_3} \ \cdots$$ hints so strongly at logging all variables to linearise the relation $$b_0 = \log a;\ \ \ \log Y = b_0 + b_1 \log X_1 + b_2 \log X_2 + b_3 \log X_3 + \cdots$$

that other approaches would seem to need some special justification.

A fuller account would necessarily consider also what assumptions or implied ideal conditions for error or disturbance terms go best with any functional form.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
2

I would say that it depends:

Many transformations are most likely to be useful on a single variable, and will depend on the distribution of such feature. For example, you might want to trim some values for a feature that has a lot of outliers, impute the ones of a feature that has missing values, log transform one that is extremely skewed, or combine two features that have a very clear connection. These transformations will depend on the feature itself, on its meaning, its marginal distribution and its joint distribution with the target.

Other transformations are instead often performed dataset-wise. For example standardization or normalization of the features is usually performed on every variable - often because it is required by subsequent methods (variance or distance based, such as PCA, for example).

Davide ND
  • 2,305
  • 8
  • 24
  • 1
    This answer resembles my own and adds something new, all fine, so (+1). However, you're emphasising mostly transformations undertaken because of the marginal distribution or some details of each variable. In regression, I'd say it's whether a transformation gets closer to linearity of relationship that is, or should be, the bigger deal. The aims can be consistent, as when dampening an outlier helps modelling a relationship or as when logging for skewness reduces curvature too. – Nick Cox Feb 19 '20 at 15:26
  • Thanks. I was not very clear about it indeed, but when I wrote "about its distributionS" I meant its marginal and its joint with the target :) Will amend – Davide ND Feb 19 '20 at 15:30