A log transformation of the dependent variable is sometimes recommended as a remedy for some cases of non-normal distribution of residuals after fitting a linear regression model. What is the proper method to evaluate whether the transformed variable should be kept transformed or not in further modeling on the same data?
Asked
Active
Viewed 430 times
4
-
1What are you trying to model? It may be better to use a different distribution to model your data, e.g. via a generalized linear model instead of a linear model/ANOVA. – Stefan Feb 15 '19 at 22:08
-
would you please post a link to the data? – James Phillips Feb 15 '19 at 23:37
2 Answers
3
Probably the easiest approach is to simply plot the distribution of your response (or residuals) and check whether or not the distribution looks Gaussian in the original scale compared to the log scale.
For example, here's a distribution of time-to-event data where $\log Y \sim \mathcal{N}(\mu, \sigma^2)$.

Tony Duan
- 31
- 2
-
Any advice on how to determine the relative residual normality of the original data vs the log data? – ReneBt Feb 16 '19 at 06:34
-
2Hoping that any transformation makes data look normal is usually asking for too much. Indeed, it's an unusual statistical procedure that absolutely requires the data to be very close to normally distributed. For a thread that discusses what transformation of data is trying to achieve, please see https://stats.stackexchange.com/questions/298/in-linear-regression-when-is-it-appropriate-to-use-the-log-of-an-independent-va/3530#3530. – whuber Feb 16 '19 at 20:18
-
@whuber So let's say a log transformation seemed reasonable according to the guidelines in the post you linked to. What is the proper way to evaluate the transformation in order to decide whether it should replace the original untransformed variable in the model? – user31527 Feb 17 '19 at 11:48
-
@user31527 That would require a lengthy book to answer properly. Some good introductory resources from an exploratory perspective are Tukey's *EDA* and Hoaglin *et al,* *Understanding Robust and Exploratory Data Analysis.* Other approaches include cross-validation and goodness-of-fit tests, depending on the situation, the purposes, and the assumptions. – whuber Feb 17 '19 at 16:03
0
My preference is to not transform data for statistical reasons, only substantive ones.
If the assumptions of one model are violated, use a different model. It used to be that you more or less had to use linear regression because other methods either had not been developed or were intractable without powerful computers. That is no longer true.
Consider quantile regression and robust regression, for starters.

Peter Flom
- 94,055
- 35
- 143
- 276