6

I'm trying to understand how to analyse transformed data using ANOVA.

If my data does not meet the assumption of normality, and I have transformed it using a log transformation to fix that, would I then run my analysis on the transformed scores and cite the transformed values?

Edit: More specifically, when you have transformed data so it meets the assumption of normality and then run a 1 way ANOVA on the transformed scores, if you want to graph the data, should you use the transformed scores as well, or the original means? Since graphing illustrates relationship between the variables I would guess use the transformed values.

Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
Calvin
  • 91
  • 2
  • 2
  • 5

3 Answers3

6

This depends on a number of things. The analysis was done within the transformation space so presenting the data back-transformed can distort things (untransformed means is just wrong, but converting it back from the transformed after summarizing, means, variance, etc. might be OK in certain situations). I guess the first thing I'd do is see how it looks when you back-transform. Does back-transforming tell the exact same story as the transformed data. If so, then you're probably fine to present it that way. If not then you need to present the transformed summary.

Even if you do back-transform you need to be clear in your results section that the analysis applies to the transformation. You say, "we found significant effects in the log of the data", etc.

Some transformations are variations of an arbitrary measurement anyway. For example, you might measure reaction time in seconds and have a mean of 0.5. Typically that kind of data is tailed out to the right and sometimes can be normalized by simply taking the inverse, so now your mean is 2 response / second. It's hard to argue that either one more meaningfully represents what happened and they're also both straightforwardly expressive and easy to interpret.

Another thing to consider is that sometimes the transformed data actually are more meaningful. Sometimes the data need to be transformed partially because the transformation is the more natural expression of the response variable.

There are probably lots of things to consider I haven't even mentioned. If you're having a difficult time deciding for your particular problem then ask the particular question about the exact kind of data you have.

John
  • 21,167
  • 9
  • 48
  • 84
2

@John has a really good answer here. I just want to add an orthogonal point. Having normally distributed data isn't as important as many people believe. The Gauss-Markov theorem tells us that it's not necessary for model estimation. Normality is required for $p$-values to be accurate with low $N$ (i.e., $p$-values will be correct, even with non-normal data, if $N$ is sufficiently high). If $N$ is low, then you would want to bootstrap your standard errors / $p$-values.

Transformations are often performed because the data are most meaningful / interpretable in that scale or to correct for heterogeneity of variance (a more important problem than non-normality). For instance, John used reaction times as an example. It is well-known that the standard deviation of reaction times increases as the mean increases. Taking the log stabilizes the variance.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
0

It will depend on your application, but in the biological sciences it's advised to present the un-transformed means as they are usually more interpretable than the transformed means

Chris
  • 470
  • 3
  • 10
  • How do you support such a broad generalization, Chris? For example, many quantities in "biological sciences" have chemical meanings where the logarithm is the most natural (and, arguably, the most readily interpretable) way to express values. – whuber Nov 28 '11 at 18:07