0

I am trying to understand how the Box-Cox power transformation works. So I took one of my datasets and ran the powerTransform function of the "car" package (via R Commander) after having computed a one-way ANOVA on the data. The first lambda hat was 1.49 (95% CI: 1.16-1.82). [variable V1 in the output below]

Then, I took the same data, but anchored the minimal value of the dependent variable at the value 1 (instead of 12 in the first analysis) following Osborne suggestion (2010; https://www.researchgate.net/profile/Jason_Osborne2/publication/286178015_Improving_your_data_transformations_Applying_the_Box-Cox_transformation/links/57cd832408ae83b37460d754/Improving-your-data-transformations-Applying-the-Box-Cox-transformation.pdf_) Lambda hat was 0.97 this time (95% CI: 0.81-1.14), suggesting that no transformation was required. [variable V1.MIN1]

How is this possible, because the data were essentially the same, apart from being moved to the left by a -11 constant value ? The data were still displaying exactly the same skewness and kurtosis as the one in the first analysis. How is it possible that a transformation was considered relevant in the first analysis, but not anymore in the second one ?

In a last step, I did a reflect on the original data (multiplication by a -1 coefficient), because the skewness was negative. Then, I anchored the minimal value at 1 (as suggested by Osborne, 2010). Lambda hat was 0.35 this time (95% CI: 0.21-0.49). [variable V1.REV.MIN1]

+++ load("C:/R/boxcox.RData")

+++ summary(addsm)
       V1           V1.MIN1       V1.REV.MIN1    GRP    
 Min.   :12.00   Min.   : 1.00   Min.   : 1.00   C: 69  
 1st Qu.:29.75   1st Qu.:18.75   1st Qu.: 8.00   T:159  
 Median :44.00   Median :33.00   Median :16.00          
 Mean   :39.53   Mean   :28.53   Mean   :20.47          
 3rd Qu.:52.00   3rd Qu.:41.00   3rd Qu.:30.25          
 Max.   :59.00   Max.   :48.00   Max.   :48.00          

+++ library(abind, pos=16)

+++ library(e1071, pos=17)

+++ numSummary(addsm[,c("V1", "V1.MIN1", "V1.REV.MIN1"), drop=FALSE], groups=addsm$GRP, statistics=c("mean", "sd", "skewness", "kurtosis"), quantiles=c(0,.25,.5,.75,1), type="2")

Variable: V1 
      mean       sd   skewness   kurtosis   n
C 44.69565 10.94143 -1.1036232  0.9661789  69
T 37.28931 16.08220 -0.3860327 -1.3085015 159

Variable: V1.MIN1 
      mean       sd   skewness   kurtosis   n
C 33.69565 10.94143 -1.1036232  0.9661789  69
T 26.28931 16.08220 -0.3860327 -1.3085015 159

Variable: V1.REV.MIN1 
      mean       sd  skewness   kurtosis   n
C 15.30435 10.94143 1.1036232  0.9661789  69
T 22.71069 16.08220 0.3860327 -1.3085015 159

+++ library(mvtnorm, pos=18)

+++ library(survival, pos=18)

+++ library(MASS, pos=18)

+++ library(TH.data, pos=18)

+++ library(multcomp, pos=18)

+++ AnovaModel.1 <- aov(V1 ~ GRP, data=addsm)

+++ summary(AnovaModel.1)
             Df Sum Sq Mean Sq F value   Pr(+++F)    
GRP           1   2639  2639.5   12.17 0.000583 ***
Residuals   226  49005   216.8                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

+++ with(addsm, numSummary(V1, groups=GRP, statistics=c("mean", "sd")))
      mean       sd data:n
C 44.69565 10.94143     69
T 37.28931 16.08220    159

+++ summary(powerTransform(AnovaModel.1, family="bcPower"))
bcPower Transformation to Normality 
   Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
Y1    1.4895        1.49       1.1568       1.8223

Likelihood ratio test that transformation parameter is equal to 0
 (log transformation)
                           LRT df       pval
LR test, lambda = (0) 88.91039  1 < 2.22e-16

Likelihood ratio test that no transformation is needed
                           LRT df     pval
LR test, lambda = (1) 8.780921  1 0.003044

+++ AnovaModel.2 <- aov(V1.MIN1 ~ GRP, data=addsm)

+++ summary(AnovaModel.2)
             Df Sum Sq Mean Sq F value   Pr(+++F)    
GRP           1   2639  2639.5   12.17 0.000583 ***
Residuals   226  49005   216.8                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

+++ with(addsm, numSummary(V1.MIN1, groups=GRP, statistics=c("mean", "sd")))
      mean       sd data:n
C 33.69565 10.94143     69
T 26.28931 16.08220    159

+++ summary(powerTransform(AnovaModel.2, family="bcPower"))
bcPower Transformation to Normality 
   Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
Y1    0.9723           1       0.8051       1.1395

Likelihood ratio test that transformation parameter is equal to 0
 (log transformation)
                           LRT df       pval
LR test, lambda = (0) 185.0929  1 < 2.22e-16

Likelihood ratio test that no transformation is needed
                            LRT df    pval
LR test, lambda = (1) 0.1045433  1 0.74644

+++ AnovaModel.3 <- aov(V1.REV.MIN1 ~ GRP, data=addsm)

+++ summary(AnovaModel.3)
             Df Sum Sq Mean Sq F value   Pr(+++F)    
GRP           1   2639  2639.5   12.17 0.000583 ***
Residuals   226  49005   216.8                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

+++ with(addsm, numSummary(V1.REV.MIN1, groups=GRP, statistics=c("mean", "sd")))
      mean       sd data:n
C 15.30435 10.94143     69
T 22.71069 16.08220    159

+++ summary(powerTransform(AnovaModel.3, family="bcPower"))
bcPower Transformation to Normality 
   Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
Y1    0.3514        0.33       0.2095       0.4932

Likelihood ratio test that transformation parameter is equal to 0
 (log transformation)
                           LRT df          pval
LR test, lambda = (0) 25.89799  1 0.00000035994

Likelihood ratio test that no transformation is needed
                           LRT df       pval
LR test, lambda = (1) 67.04348  1 2.2204e-16
  • 1
    $y + c$ is just a linear transformation of $y$ but $\ln(y + c)$ or $(y + c)^k$ isn't a linear transformation of $\ln y$ or $y^k$, so the translation isn't trivial. Otherwise put, powers and logarithms do depend on where the origin is. – Nick Cox May 13 '20 at 15:30
  • 1
    Thank you for your explanation that makes clear why the results differ if a constant is added to the raw data before the transformation is applied. In their article, Box and Cox (1964) develop their technique to make "a normal, homoscedastic, linear model [is] appropriate after some suitable transformation has been applied to the y's" (p. 211). If a research wants to apply the Box-Cox power transformation, is there a need to anchor the distribution at a specific minimal value ? – EduardodlVega May 13 '20 at 15:50
  • 1
    Osborne (2002) indicates: "it is my opinion that researchers seeking to utilize any of the above-mentioned data transformations should first move the distribution so its leftmost point (minimum value) is anchored at 1.0. This is due to the differential effects of the transformations across the number line. All three transformations will have the greatest effect if the distribution is anchored at 1.0, and as the minimum value of the distribution moves away from 1.0 the effectiveness of the transformation diminishes dramatically." (https://pareonline.net/htm/v8n6.htm). – EduardodlVega May 13 '20 at 15:55
  • In the examples presented in my first post, this recommendation (i.e., anchoring at 1) has the effect of suggesting that no transformation is needed. On the contrary, if the original data is used, a transformation is required. In light of such contrasting results, what is one supposed to do ? – EduardodlVega May 13 '20 at 15:55
  • 2
    I haven't read the Osborne paper, beyond page 1, where he puts most emphasis on getting closer to a normal distribution, which is the least important goal! So, I gave up at that point and can't easily comment further helpfully, except to underline that adding or subtracting constants is here much more dangerous than it may seem. That is perhaps most obvious for logarithms where there is a simple but also strict interpretation in terms of multiplicative changes, which is destroyed by arbitrary translations. – Nick Cox May 13 '20 at 15:59
  • 1
    Two fundamental problems are that anchoring at 1 is more arbitrary than is implied and compromises, indeed prevents, comparability between studies. Again, I am relying on your summary here. – Nick Cox May 13 '20 at 16:06
  • The problem is that in psychological research, most scales have arbitrary anchored values (i.e., summation of 1 to 5 Likert items; why not 0-4 ?). So, adding a constant makes it no more arbitrary than by using the original scale. That would have been different if the values related to something like concentration in chemistry (e.g., https://stats.stackexchange.com/questions/18844/when-and-why-should-you-take-the-log-of-a-distribution-of-numbers). If the original scale has no intrinsic meaning related to the numbers (apart from their variability and order), is your comment still relevant? – EduardodlVega May 13 '20 at 16:36
  • 1
    I don't know why you focus on psychological research: the topic seems general. Perhaps that is your field. But you are making my point for me. Logarithms and powers depend on origins making sense and so on ratio scale measurement. I can't see any rationale for taking logarithms or powers of Likert scales, whether as they arrive or translated. Shifting 1-5 to 0-4 would rule out logarithms. Sometimes there is point to transforming e.g. ordered scales (see John Tukey on folded root transformations) but I don't know any good procedure that is capricious about moving the origin. – Nick Cox May 13 '20 at 17:02
  • 1
    If Osborne really wrote that opinion about "anchoring ... at 1.0," he was completely wrong. Some evidence is available in the analysis at https://stats.stackexchange.com/a/30749/919. A nice counterexample is presented at https://stats.stackexchange.com/a/35717/919. – whuber May 13 '20 at 17:14
  • Nick Cox and whuber, first of all, thank you for taking your time to provide useful information. I am a little bit confused, because Keene (1995) wrote "For continuous positive data measured on an interval scale, a log transformed analysis should frequently be preferred to an untransformed analysis. No special justification beyond that sufficient to support an untransformed analysis should be required" (p. 818). https://www.nki.nl/media/837444/log.pdf Keene, O. N. (1995). The log transformation is special. Statistics in Medicine,14 (8), pp. 811-819, 10.1002/sim.4780140810 – EduardodlVega May 13 '20 at 19:24
  • On the other hand, Nevill & Lane (2007) published an editorial titled "Why self-report ‘‘Likert’’ scale data should not be log-transformed". Source: Nevill, A., & Lane, A. (2007). Why self-report "Likert" scale data should not be log-transformed. Journal of Sports Sciences, 25(1), 1-2. https://www.researchgate.net/publication/32117377_Why_self-report_Likert_scale_data_should_not_be_log-transformed – EduardodlVega May 13 '20 at 19:25
  • I edited the title of the topic to reflect the content of the discussion. – EduardodlVega May 13 '20 at 19:29
  • No contradiction: Keene’s excellent paper, which I do know, underlines that logarithms can be a good thing. He doesn’t say that they are compulsory! The other paper’s title echoes a point made earlier. No support is implied for Osborne’s ideas. – Nick Cox May 13 '20 at 19:42
  • Sorry, Nick Cox, I am really confused there. From your understanding, do the points made by Keene (1995) apply or not the summated Likert scales ? Although Keene (1995) refers to interval scales, the various examples he presents in his article are related to ratio scales (gastric emptying time, plasma cortisol (nmol/l)). If so, there is a real 0 and these scales are more ratio scales. – EduardodlVega May 13 '20 at 21:01
  • My confusion was related to the the idea that summated scales are mostly considered interval, although individual items are clearly ordinal. I know there is some debate on this matter, buy most authors seems to agree to treat summated Likert scales with adequate properties as interval. So, I perceived as a contradiction the need for a real 0 and the idea that logs are relevant for interval scales. As a consequence, the solution suggested by Osborne seemed acceptable. But after rereading the various information, power transformation is clearly only relevant for ratio scales. – EduardodlVega May 13 '20 at 21:05
  • Again, even you stretch as far as regarding some Likert scales as interval, that does not make them ratio scale. – Nick Cox May 13 '20 at 23:01
  • I agree, thank you. – EduardodlVega May 14 '20 at 04:13
  • 1
    An answer (rather than comments) that gave a detailed discussion and dissection of Osborne's paper might be a public service, but I am not volunteering. @whuber would be an excellent person to do it, but my wild guess is that he shares my disinclination: he must speak for himself. – Nick Cox May 14 '20 at 07:57
  • I add that I have elsewhere been fairly positive about transformations such as $\text{sign}(y) \log (1 + |y|)$ for $y$ that can be negative, zero or positive. Here the rationale is _ad hoc_: the transformation is an approximation and may work well in practice for measured variables. That has no bearing on e.g. Likert scales. $\log (y + c)$ where $c$ is a fudge constant is a very dangerous transformation because the choice of $c$ is crucial. A common fallacy is that $\log (y + c)$ will be very close to $\log y$ if $c$ is very small which is true for large $y$ but utterly wrong near $0$. – Nick Cox May 14 '20 at 08:24
  • Just to let you know that there are several other articles advocating that "The first step in the transformation procedure is to anchor the minimum value of X to 1 via the following: X + (1 - Xmin)". Referenced on p. 261 in Clark, J. E., Osborne, J. W., Gallagher, P., and Watson, S. ( 2016) A simple method for optimising transformation of non‐parametric data: an illustration by reference to cortisol assays. Hum. Psychopharmacol Clin Exp, 31: 259– 267. doi: 10.1002/hup.2528. https://onlinelibrary.wiley.com/doi/abs/10.1002/hup.2528 – EduardodlVega May 17 '20 at 02:27

0 Answers0