Is it mathematically justifiable to log transform variables before running an ANOVA?

Question

I have a model with variables (financial ratios) and some of them are in percentages, some in days and some just ratios (negative and positive). I ran an ANOVA and the results were not so good. When I applied the Shapiro test, 24 out of the 26 variables were not normally distributed. So I log-transformed every variable and the results went exactly as I wanted. But firstly I do not know if this is mathematically correct and secondly how I could justify this transformation.

ANOVA is essentially a generalisation of the t-test, so [this answer](http://stats.stackexchange.com/questions/25738/is-a-log-transformation-a-valid-technique-for-t-testing-non-normal-data/25753#25753) may be of interest to you. I don't think that the Shapiro-Wilk test is a good choice for a test for normality here either; see [this answer](http://stats.stackexchange.com/questions/2492/normality-testing-essentially-useless/30053#30053). — MånsT, Jul 04 '12 at 06:57
What kind of variable is the response variable? Did you look at the distribution of it first? Do you actually have continuous predictors in an ANOVA? Are you saying you transformed the predictor variables or response variable or both? Or, are you saying you really have a MANOVA? It sounds more like you have multiple regression than ANOVA — John, Jul 04 '12 at 12:56
Mathematics can't be used to justify a transformation and statistics doesn't justify it eitheer. Statistics may show that after transformation you have a good fit but the grounds for justification should be subject matter based. Shapiro-Wilk is good test for normality but whether to take the non-normality seriously, depends on the sample size. It can detect small departures if the sample size is several hundred to 1000. Look at QQ plots to see if kurtosis or skewness are high. — Michael R. Chernick, Jul 04 '12 at 14:51
Jim, Because ANOVA and linear regression have a lot of overlap (they usually use the same estimators) and this question has been asked for linear regression, you may find an adequate answer in that thread: see http://stats.stackexchange.com/questions/298/. My answer there disagrees with part of @Michael Chernick's preceding comment: I cite many possible ways to justify a transformation purely on statistical grounds. Nor would I use procedures based on Normality testing or higher moments (like skewness or kurtosis), but graphical diagnostics like QQ plots can indeed be revealing. — whuber, Jul 04 '12 at 18:12
@whuber I did not advocate normality tests if the sample size is large and agreed with you about the QQ plots. But I have to strongly disagree that scientific decisions like applying a specific transformation to data can be justified purely on statistical grounds. Statistics can only tell you that the data fits well to a model that uses the transformation, it cannot tell you that this expresses a true relationship between the covariates and the response variable. — Michael R. Chernick, Jul 04 '12 at 18:19
@Michael This may be a fundamental disagreement or perhaps merely reflects different assumptions about the purpose of data analysis. "**We now regard re-expression** [*e.g.*, taking logs] **as a tool, something to let us do a better job of grasping data.** ... We have begun to realize that taking a firm hand with the data, before we display it or make detailed calculations, is often either the best or the only thing to do."--John Tukey (emphasis in the original), writing about transforming data (EDA, section 3H). — whuber, Jul 04 '12 at 18:34
@whuber My comment has nothing to do with the purpose of data analysis and is probably not a fundamental disagreement either. It really only has to do with the word justified. If you want to say that a good fit is valid justification for the purpose of data analysis I won't disagree. If you you mean justified in the sense that goodness of fit implies that it is a valid description of an underlying process I would vehemently disagree. That is the sense that I took the OPs term "mathematically" justified. So when I say that it cannot be justified statistically either — Michael R. Chernick, Jul 04 '12 at 19:25
I also mean justifying the relationship more strictly than a good way to create an nice fitting model. — Michael R. Chernick, Jul 04 '12 at 19:25
@Michael, OK, no disagreement here: thanks for helping me appreciate the nuances of your position. — whuber, Jul 04 '12 at 19:27

Is it mathematically justifiable to log transform variables before running an ANOVA?

0 Answers0