Is it always a good idea to perform Box-Cox transformation on positive predictors to make it normally distributed?

Question

It is a pretty general question: a counterexample should also help.

Possible duplicate of [When to transform predictor variables when doing multiple regression?](https://stats.stackexchange.com/questions/18320/when-to-transform-predictor-variables-when-doing-multiple-regression) — kjetil b halvorsen, Dec 21 '18 at 07:17
Whether predictors are normally distributed or not is generally irrelevant. Regression models make no assumptions about the distributions of the predictors. — gung - Reinstate Monica, Dec 21 '18 at 12:00
Box-Cox analyses for predictors might indicate transformations that are a good idea on other grounds. It's worth underlining that in their original paper Box and Cox [no relation] showed how calculations should be used to **_suggest_** a transformation. So, if you have positive values and the transformation power comes back estimated at 0.123, you should probably use logarithms. There is no magic in arbitrary powers. — Nick Cox, Dec 21 '18 at 12:11

score 4 · Answer 1 · edited Dec 21 '18 at 12:09

4

No, especially when the goal is inference. Box-Cox transforms will usually change the data in a way which makes interpretation of the coefficients of your model very difficult.

Aside from inference, I suppose it depends on the method. I know Linear Discriminant Analysis assumes covariates are multivariate normal, so it might help in those instances.

edited Dec 21 '18 at 12:09

Nick Cox

48,377
8
110
156

answered Dec 21 '18 at 02:38

Demetri Pananos

24,380
1
36
94

3

When a Box-Cox transformation is developed according to a principled exploratory data analysis, it's often the case that it is *improves* the interpretation of coefficients. In regression, for instance, the objective is to make the relationship with the response *linear,* which is about the simplest--and therefore the most easily interpretable--possible relationship. See https://stats.stackexchange.com/questions/4831, and https://stats.stackexchange.com/questions/298, https://stats.stackexchange.com/questions/35711 for more detailed discussions. – whuber Dec 21 '18 at 16:22
@whuber Don't you lose the "a unit increase in the predictor leads to a beta increase in the outcome" interpretation of coefficients then? If I transform weight through some Box-Cox transform (which is not the logarithm), how am I to interpret the resulting coefficient? – Demetri Pananos Dec 21 '18 at 16:31
1

My point is that you *gain* that interpretation, which was earlier incorrect. If indeed the relationship between response and regressor *in the original units of measurement* is nonlinear, then it is wrong to assert "a unit increase ... leads to a beta increase," because (by the very definition of non-linear) that is false no matter what value beta might have. Physics and chemistry offer standard, nice examples as illustrated in my answer in the third linked thread I provided. – whuber Dec 21 '18 at 16:47

score 0 · Answer 2 · answered Dec 21 '18 at 12:46

The simple answer is "No".

My own longer answer is that

1) As @gung pointed out in a comment, predictor variables don't have to be normal. In fact, they don't have to be continuous. Linear regression makes assumptions about the error, not the variables.

2) Even if the assumptions are violated, I would say that it is rarely a good idea to transform variables based solely on statistical grounds. Instead, you should use a method that does not make the assumptions that were violated.

Is it always a good idea to perform Box-Cox transformation on positive predictors to make it normally distributed?

2 Answers2