I am trying to understand how the Box-Cox power transformation works. So I took one of my datasets and ran the powerTransform function of the "car" package (via R Commander) after having computed a one-way ANOVA on the data. The first lambda hat was 1.49 (95% CI: 1.16-1.82). [variable V1 in the output below]
Then, I took the same data, but anchored the minimal value of the dependent variable at the value 1 (instead of 12 in the first analysis) following Osborne suggestion (2010; https://www.researchgate.net/profile/Jason_Osborne2/publication/286178015_Improving_your_data_transformations_Applying_the_Box-Cox_transformation/links/57cd832408ae83b37460d754/Improving-your-data-transformations-Applying-the-Box-Cox-transformation.pdf_) Lambda hat was 0.97 this time (95% CI: 0.81-1.14), suggesting that no transformation was required. [variable V1.MIN1]
How is this possible, because the data were essentially the same, apart from being moved to the left by a -11 constant value ? The data were still displaying exactly the same skewness and kurtosis as the one in the first analysis. How is it possible that a transformation was considered relevant in the first analysis, but not anymore in the second one ?
In a last step, I did a reflect on the original data (multiplication by a -1 coefficient), because the skewness was negative. Then, I anchored the minimal value at 1 (as suggested by Osborne, 2010). Lambda hat was 0.35 this time (95% CI: 0.21-0.49). [variable V1.REV.MIN1]
+++ load("C:/R/boxcox.RData")
+++ summary(addsm)
V1 V1.MIN1 V1.REV.MIN1 GRP
Min. :12.00 Min. : 1.00 Min. : 1.00 C: 69
1st Qu.:29.75 1st Qu.:18.75 1st Qu.: 8.00 T:159
Median :44.00 Median :33.00 Median :16.00
Mean :39.53 Mean :28.53 Mean :20.47
3rd Qu.:52.00 3rd Qu.:41.00 3rd Qu.:30.25
Max. :59.00 Max. :48.00 Max. :48.00
+++ library(abind, pos=16)
+++ library(e1071, pos=17)
+++ numSummary(addsm[,c("V1", "V1.MIN1", "V1.REV.MIN1"), drop=FALSE], groups=addsm$GRP, statistics=c("mean", "sd", "skewness", "kurtosis"), quantiles=c(0,.25,.5,.75,1), type="2")
Variable: V1
mean sd skewness kurtosis n
C 44.69565 10.94143 -1.1036232 0.9661789 69
T 37.28931 16.08220 -0.3860327 -1.3085015 159
Variable: V1.MIN1
mean sd skewness kurtosis n
C 33.69565 10.94143 -1.1036232 0.9661789 69
T 26.28931 16.08220 -0.3860327 -1.3085015 159
Variable: V1.REV.MIN1
mean sd skewness kurtosis n
C 15.30435 10.94143 1.1036232 0.9661789 69
T 22.71069 16.08220 0.3860327 -1.3085015 159
+++ library(mvtnorm, pos=18)
+++ library(survival, pos=18)
+++ library(MASS, pos=18)
+++ library(TH.data, pos=18)
+++ library(multcomp, pos=18)
+++ AnovaModel.1 <- aov(V1 ~ GRP, data=addsm)
+++ summary(AnovaModel.1)
Df Sum Sq Mean Sq F value Pr(+++F)
GRP 1 2639 2639.5 12.17 0.000583 ***
Residuals 226 49005 216.8
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+++ with(addsm, numSummary(V1, groups=GRP, statistics=c("mean", "sd")))
mean sd data:n
C 44.69565 10.94143 69
T 37.28931 16.08220 159
+++ summary(powerTransform(AnovaModel.1, family="bcPower"))
bcPower Transformation to Normality
Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
Y1 1.4895 1.49 1.1568 1.8223
Likelihood ratio test that transformation parameter is equal to 0
(log transformation)
LRT df pval
LR test, lambda = (0) 88.91039 1 < 2.22e-16
Likelihood ratio test that no transformation is needed
LRT df pval
LR test, lambda = (1) 8.780921 1 0.003044
+++ AnovaModel.2 <- aov(V1.MIN1 ~ GRP, data=addsm)
+++ summary(AnovaModel.2)
Df Sum Sq Mean Sq F value Pr(+++F)
GRP 1 2639 2639.5 12.17 0.000583 ***
Residuals 226 49005 216.8
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+++ with(addsm, numSummary(V1.MIN1, groups=GRP, statistics=c("mean", "sd")))
mean sd data:n
C 33.69565 10.94143 69
T 26.28931 16.08220 159
+++ summary(powerTransform(AnovaModel.2, family="bcPower"))
bcPower Transformation to Normality
Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
Y1 0.9723 1 0.8051 1.1395
Likelihood ratio test that transformation parameter is equal to 0
(log transformation)
LRT df pval
LR test, lambda = (0) 185.0929 1 < 2.22e-16
Likelihood ratio test that no transformation is needed
LRT df pval
LR test, lambda = (1) 0.1045433 1 0.74644
+++ AnovaModel.3 <- aov(V1.REV.MIN1 ~ GRP, data=addsm)
+++ summary(AnovaModel.3)
Df Sum Sq Mean Sq F value Pr(+++F)
GRP 1 2639 2639.5 12.17 0.000583 ***
Residuals 226 49005 216.8
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+++ with(addsm, numSummary(V1.REV.MIN1, groups=GRP, statistics=c("mean", "sd")))
mean sd data:n
C 15.30435 10.94143 69
T 22.71069 16.08220 159
+++ summary(powerTransform(AnovaModel.3, family="bcPower"))
bcPower Transformation to Normality
Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
Y1 0.3514 0.33 0.2095 0.4932
Likelihood ratio test that transformation parameter is equal to 0
(log transformation)
LRT df pval
LR test, lambda = (0) 25.89799 1 0.00000035994
Likelihood ratio test that no transformation is needed
LRT df pval
LR test, lambda = (1) 67.04348 1 2.2204e-16