What should be log-transforming advice?

Question

I'm due to teach a workshop on statistics in a week or two, and one topic I will cover bothers me. The reason it bothers me, is that the advice I see, both in textbooks and here often on StackExchange, jars with my own intuition. So, possibly, my own intuition needs fixing before I give this workshop.

The question I have regards transformation of non-normal data. It seems common to recommend that non-normal data be transformed prior to using a parametric test (e.g. t-test), in order to meet the normality requirements. First, of course, if the sample is large enough (compared with the skew), we have CLT, so the transformation seems superfluous from the perspective of "satisfying normality assumptions". But the thing that jars with me is that it is possible, for example, to log-transform two different groups and find that their respective means actually swap places (so the higher mean becomes the lower mean etc), which would lead us to exactly the opposite conclusion than we would have prior to transformation.

What is the general advice for managing this trade-off? My own intuition is that you should log-transform where the behaviour of the DV seems like it should be logarithmic, etc, but not if you cannot justify it. Though I do appreciate that a more symmetrical distribution will often lead to a more meaningful mean.

I suppose I'm looking for three things: (i) is my intuition right that log-transforming needs to be done with extreme caution (and with greater regard to the ultimate meaning of the the log-transformed data/means), (ii) do others feel that the advice often given is too conservative in that regard, (iii) do people out there have clear clean guidelines that can be given to a bunch of NON-statisticians as to when it is a reasonable thing to do? Any ideas as to how to communicate the idea, or build an intuition for them would be highly appreciated.

Thanks!

score 3 · Accepted Answer · edited Apr 13 '17 at 12:44

In addition to the superb treatment provided by @IrishStat and others on this Cross Validated page and referenced in his answer on the present page, I want to add a few thoughts based on the challenge of presenting this to non-statisticians (as I at least used to be).

As I'm sure you know, the important issue is often not the distribution of the dependent variable or the distributions of the independent variables. What usually matters is the distribution of the residual errors after fitting a model (even the simple model implicit in a t-test).

I've seen this distinction between distributions of variables and distributions of errors get confused frequently, because many presentations of the basis of linear regression seem to start with an assumption of normal distributions for the independent and dependent variables themselves. There might be some pedagogical value to starting from this assumption, but it leaves the mistaken impression that variables ought to have normal distributions before analysis. But in principle if a dependent and an independent variable have similarly skewed distributions, there's not necessarily a problem with a regression in their original scales.

So one recommendation is to be a bit more precise about what you mean when you say "log-transform where the behaviour of the DV seems like it should be logarithmic." The behavior you care about is the residual error in the scale of the variable. If you expect the residual error to be proportional to the magnitude of the value of the dependent variable, then you probably should log-transform. This is often the case with many types of laboratory analyses that I have performed.

A second recommendation is to consider thinking a bit differently about independent and dependent variables. If you are trying to build a linear model, you want to be working in scales where there is as linear as possible a relation of changes in independent variables to changes in the dependent variable. So if you are expecting residual errors proportional to magnitudes of the dependent variable and thus are log-transforming it, you may need to transform independent variables in some way to provide such linear relations.

A third recommendation is not to consider this an all-or-none choice during early stages of exploring data. If you "log-transform two different groups and find that their respective means actually swap places" then you have learned something very important about the nature of your data that requires extra study. If you had followed a strict transform-or-don't rule, you wouldn't have discovered this.

I was just looking over my old StackExchange questions and, somehow, it seems I missed this (though I'm normally anxiously watching my inbox after posting a question). Just wanted to say this is a really excellent answer and thank you! Hits the nail on the head. Giving it a (rather late) best answer tick... — justme, Apr 05 '21 at 12:22

score 1 · Answer 2 · edited Apr 13 '17 at 12:44

1

Transformations are like drugs .. some are good for you and some are not . Look at my response to a similar question here When (and why) should you take the log of a distribution (of numbers)?

edited Apr 13 '17 at 12:44

Community

1

answered Aug 20 '15 at 13:10

IrishStat

27,906
5
29
55

Thanks for the fast response! I did indeed already read that one, and I found the advice extremely helpful. However, I think in that case it is somewhat clearer to me because it is in the case of regression, so that one can examine the response of the variance to the underlying. I'm a bit more perplexed in the case of, say, comparing two group means. – justme Aug 20 '15 at 13:13
Well comparing two group means is in effect a regression problem. Simply create a dummy variable X with n1 0's and n2 1's where n1 is the number of values in group1 and n2 is the number of values in group2. Estimate an OLS model. The t value for the predictor variable X is the square root of the F value that you would get if you ran an F test for the equality of two means thus the tests are identical – IrishStat Aug 20 '15 at 13:55
Absolutely. And I suppose that, if we can assume homoskedasticity being appropriate in the transformed values, then we can apply the same principle as you described (i.e. see how the variance of the two groups relates to the group means). But it seems harder to establish that that is the case when we have only two group means/spreads to base this on. How can we justify with any certainty that the difference in spreads between the two groups is because they need to be transformed, rather than because there is underlying heteroskedasticity in the first place? Is there another justification? – justme Aug 20 '15 at 14:04
1

(-1) This answer is not constructive. – whuber Aug 20 '15 at 14:16
@whuber In what way was it not constructive ?. It appears the the OP thought it was. I am assuming you are referring to my answer and not his comment. – IrishStat Aug 20 '15 at 14:32
1

I am indeed referring to this answer. It says nothing new--in fact it says nothing at all that is of any use to any reader. If you would like to refer readers to other posts you have made, then please make such remarks into *comments.* – whuber Aug 20 '15 at 14:34
I understand your point. You would simply prefer a comment rather than an answer as the answer added nothing substantial. – IrishStat Aug 20 '15 at 14:36

What should be log-transforming advice?

2 Answers2