Estimating Lambda for Box Cox transformation for ANOVA

Question

Assumptions:

In an ANOVA where the normality assumptions are violated, the Box-Cox transformation can be applied to the response variable. The lambda can be estimated by the using maximum likelihood to optimize the normality of the model residuals.

Question:

When the estimates for lambda in the null model and the full model differ, how should lambda be estimated?

My Data:

In my data the lambda estimate for the null model is -2.3 and the lambda estimate for the full model is -2.8. Transforming the response using these different parameters and preforming the ANOVA leads to different F-statistics.

I have produced below a simplified version of the analysis using beta distributions with different parameters to simulate non-normality. Unfortunately, in this example the results of the ANOVA are insensitive to the different estimates of lambda. So, it doesn't fully illustrate the problem.

library(ggplot2)
library(MASS)
library(car)


#Generating random beta-distributed data
n=200
df <- rbind(
  data.frame(x=factor(rep("a1",n)), y=rbeta(n,2,5)), # more left skewed
  data.frame(x=factor(rep("a2",n)), y=rbeta(n,2,2))) # less left skewed

print(qplot(data=df, color=x, x=y, geom="density"))

print("Untransformed Analaysis of Variance:")
m.null <- lm(y ~ 1, df)
m.full <- lm(y ~ x, df)
print(anova(m.null, m.full))

# Estimate Maximum Liklihood Box-Cox transform parameters for both models
bc.null <- boxcox(m.null); bc.null.opt <- bc.null$x[which.max(bc.null$y)]
bc.full <- boxcox(m.full); bc.full.opt <- bc.full$x[which.max(bc.full$y)]

print(paste("ML Box-Cox estimate for null model:",bc.null.opt))
print(paste("ML Box-Cox estimate for full model:",bc.full.opt))

df$y.bc.null <- bcPower(df$y, bc.null.opt)
df$y.bc.full <- bcPower(df$y, bc.full.opt)

print(qplot(data=df, x=x, y=y.bc.null, geom="boxplot"))
print(qplot(data=df, x=x, y=y.bc.full, geom="boxplot"))


print("Analysis of Variance with optimial Box-Cox transform for null model")
m.bc_null.null <- lm(y.bc.null ~ 1, data=df)
m.bc_null.full <- lm(y.bc.null ~ x, data=df)
print(anova(m.bc_null.null, m.bc_null.full))

print("Analysis of Variance with optimial Box-Cox transform for full model")
m.bc_full.null <- lm(y.bc.null ~ 1, data=df)
m.bc_full.full <- lm(y.bc.null ~ x, data=df)
print(anova(m.bc_full.null, m.bc_full.full))

Something is seriously wrong when you have to use Box-Cox transformations this strong. Moreover, there's not much practical difference between -2.8 and -2.3: you could safely use -2.5 in both situations. I suspect you may have one or more outliers to deal with. Box-Cox transformations really shouldn't be used in such an automated way: they are more suited for exploration. Draw a ladder of probability plots of transformed residuals for each model, varying $\lambda$ from $-1$ to $2$ in units of $1/2$, to see what might be happening. — whuber, Apr 28 '11 at 14:26

score 5 · Answer 1 · answered Apr 28 '11 at 16:04

The Box-Cox transformation tries to improve the normality of the residuals. Since that is the assumption of ANOVA as well, you should run it on the model that you are actually going to use, i.e. the full model. For example, if you have two well separated groups, the distribution of the response variable will be strongly bimodal and nowhere near normal even if within each group the distribution is normal.

Additionally, you certainly want to take whuber's comment to heart, and check for outliers, missing predictors, etc to make sure that some artifact is not driving your transformation. Also consider the confidence interval around the optimal lambda, and whether a particular transformation within that interval does make applied sense. For example, if you have linear measurements, but the outcome would reasonably be related to a volume, then a lambda=3 or lambda=-3 might be meaningful. If, on the other hand, areas are involved, then 2 or -2 might be better choices.

However if a later step assumes that lambda is estimated without error, then all confidence intervals will be too narrow and P-values too small (i.e., confidence coverages will be far from the claimed value such as 0.95). A unified "estimate lambda while estimating the betas" approach is needed. — Frank Harrell, May 03 '11 at 15:40

score 4 · Accepted Answer · answered Apr 28 '11 at 16:16

4

It is not appropriate to do ordinary ANOVA after using the same dataset to fit lambda. The analysis should be unified, penalizing for uncertainty in lambda (a parameter to be estimated, and included in the covariance matrix).

answered Apr 28 '11 at 16:16

Frank Harrell

74,029
5
148
322

How would you do that? When $\lambda$ changes, the *meaning* (units of measurement included) of the other parameters changes, so confidence intervals for , say, the expectation $\mu$ or regression coefs $\beta$ **taking into account uncertainty in $\lambda$** does not seem to make sense? (For instance Box thought so, if I understood correctly) – kjetil b halvorsen Feb 25 '17 at 14:14

Estimating Lambda for Box Cox transformation for ANOVA

Assumptions:

Question:

My Data:

2 Answers2

Linked