Power Transformations of a Single Variable for Multivariate Data Sets

Question

Let's say I have a dataset with 5 variables and 100 observations. After checking for normality, all but one is normal, let's say the 2nd variable. After using a Box-Cox method I find a suitable lambda that will transform the 2nd variable's data to make it normal.

Would transforming only the data of the second variable mess with any multivariate inferences about the data? Should I transform all the variables' data by the same lambda to equalize it? Should I try to "more" normalize the rest of the variable's data with their own lambda so that all the variables have been transformed? Or should I not even try to normalize the 2nd variable and leave it as is?

This raises more questions than it answers. At the same time, the essence is simple: No; there is no rule in statistics that all variables must be transformed in the same way and often that is not even possible in principle. Suppose you have two variables: one variable is always positive but very highly skewed; and another is binary with values 0 and 1. Then log transformation could be sensible for the first but neither necessary nor possible for the second. — Nick Cox, Apr 13 '20 at 15:51
The question is not even consistent, however. If all but one variable are normal, then it is not even possible to transform that majority to make them more normal. No further transformation is possible. In practice, I never believe any statement that some variables are normal and some are not. At most,variables display varying approximations to normality and the question is whether the approximation is good enough for the intended purpose, which is usually unstated. Often, marginal normality is not even an ideal, contrary to the implication in many, many questions like this one. — Nick Cox, Apr 13 '20 at 15:56
See https://stats.stackexchange.com/a/35717/919 for a detailed example illustrating some of the principles to finding transformations. This example winds up selecting two different transformations for two variables--and the result turns out to be both meaningful and insightful. Notice that the example makes no reference to Normal distributions, nor does it even hint at them--because Normality (as @Nick indicates) is often irrelevant. — whuber, Apr 13 '20 at 15:59

Power Transformations of a Single Variable for Multivariate Data Sets

0 Answers0