25

In reviewing a paper, the authors state, "Continuous outcome variables exhibiting a skewed distribution were transformed, using the natural logarithms, before t tests were conducted to satisfy the prerequisite assumptions of normality."

Is this an acceptable way to analyze non-normal data, particularly if the underlying distribution is not necessarily lognormal?

This may be a quite uncommon question, but I have not seen this done before....

Steffen Moritz
  • 1,564
  • 2
  • 15
  • 22
CLS
  • 351
  • 1
  • 3
  • 3
  • 3
    Well, if the initial distribution is not log-normal, then the transformed data does not satisfy the prerequisite assumptions of normality, so what is being gained by the transformation? – Macro Apr 02 '12 at 23:43
  • @Macro - true enough! (+1) - they probably just wanted to get the distributions closer to symmetric, which is not a bad thing to want to do for t-testing, but, unless they checked and wrote it up, we don't know if the log transform induced a negative skew that might have made matters worse... – jbowman Apr 03 '12 at 00:23
  • 2
    We might infer that because it was done to satisfy normality, and normality was checked int the first place, that normality was checked afterwards. It's strongly implicit in the language here. – John Apr 03 '12 at 01:25
  • 11
    A t-test for the logarithms is neither the same as a t-test for the untransformed data nor a nonparametric test. The t-test on the logs compares *geometric* means, not the (usual) arithmetic means. This is one of several important considerations in deciding whether using the logarithms is acceptable (which it can be, depending on the application). – whuber Apr 03 '12 at 06:04
  • Generally speaking if the assumptions required to carry out a t-test are not met, then it would be more appropriate to use a non-parametric test. – user7045 Apr 03 '12 at 02:06

1 Answers1

11

It is common to try to apply some kind of transformation to normality (using e.g. logarithms, square roots, ...) when encountered with data that isn't normal. While the logarithm yields good results for skewed data reasonably often, there is no guarantee that it will work in this particular case. One should also bear @whubers comment above in mind when analysing transformed data: "A t-test for the logarithms is neither the same as a t-test for the untransformed data nor a nonparametric test. The t-test on the logs compares geometric means, not the (usual) arithmetic means."

Transformations to normality should always be followed by an investigation of the normality assumption, to assess whether the transformed data looks "normal enough". This can be done using for instance histograms, QQ-plots and tests for normality. The t-test is particularly sensitive to deviations from normality in form of skewness and therefore a test for normality that is directed towards skew alternatives would be preferable. Pearson's sample skewness $\frac{n^{-1}\sum_{i=1}^n(x_i-\bar{x})^3}{(n^{-1}\sum_{i=1}^n(x_i-\bar{x})^2)^{3/2}}$ is a suitable test statistic in this case.

Rather than choosing a transformation (such as logarithms) because it works most of the time, I prefer to use the Box-Cox procedure for choosing a transformation using the given data. There are however some philosophical issues with this; in particular whether this should affect the number of degrees of freedom in the t-test, since we've used some information from the sample when choosing which transform to use.

Finally, a good alternative to using either the t-test after a transformation or a classical nonparametric test is to use the bootstrap analogue of the t-test. It does not require the assumption of normality and is a test about the untransformed means (and not about anything else).

MånsT
  • 10,213
  • 1
  • 46
  • 65
  • 2
    +1 Good, thoughtful discussion with a good recommendation at the end. For more about the bootstrap/resampling/permutation version of the t-test, please see a recent thread at http://stats.stackexchange.com/q/24911. – whuber Apr 03 '12 at 07:19
  • I'd be truly careful with using any Box-Cox to real data analysis, requiring interpretation and reporting range estimates (confidence intervals): https://www.quora.com/Why-is-the-Box-Cox-transformation-criticized-and-advised-against-by-so-many-statisticians-What-is-so-wrong-with-it/answer/Adrian-Olszewski-1?ch=10&share=b727f842&srid=MByz – Bastian Aug 25 '21 at 00:15