3

Suppose I have some heavy-tailed data that I want to transform so it's roughly normal in order to perform a t-test.

Original QQ Plot

    Shapiro-Wilk normality test

data:  NAPChange
W = 0.72716, p-value < 2.2e-16

Then, I use the LambertW R package to transform the data so that it is closer to a normal distribution (as per this post):

mod.Lh <- MLE_LambertW(x, distname = "normal", type = "h")

summary(mod.Lh)
Call: MLE_LambertW(y = NAPChange, distname = "normal", type = "h")
Estimation method: MLE
Input distribution: normal

 Parameter estimates:
       Estimate  Std. Error  t value  Pr(>|t|)    
mu    0.0057600   0.0028504   2.0208    0.0433 *  
sigma 0.0383100   0.0031284  12.2459 < 2.2e-16 ***
delta 0.3429300   0.0677444   5.0621 4.146e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
-------------------------------------------------------------- 

Given these input parameter estimates the moments of the output random 
variable are 
  (assuming Gaussian input): 
 mu_y = 0.01; sigma_y = 0.09; skewness = NA; kurtosis = Inf.

y <- get_input(mod.Lh)
test_norm(y)

$shapiro.wilk

Shapiro-Wilk normality test

data:  data.test
W = 0.99463, p-value = 0.5504

enter image description here

Now that the data is roughly normal, I can finally perform a t-test.

HOWEVER, how can I interpret/quantify the results after such a transformation? Is there a way to revert the results to their original form (i.e. untransform results)?

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Grint
  • 345
  • 1
  • 9
  • 1
    It would be interesting to have access to your data. My own bias is that the Lambert W, although fascinatingly exotic, needs to be compared with simpler solutions such as cube root, more generally signed roots, neglog and asinh. A different refrain (common in this forum) is that normality or bust is not quite the game: transformations that pull in the tails a bit even if normality is not achieved are often adequate for most statistical purposes. Either way, why you think attaining normality is a good idea is always worth comment. For example, comparing means doesn't require a t test. – Nick Cox Aug 24 '17 at 14:27
  • @NickCox Unfortunately I cannot share my data, but it looks exactly like that. The cubic root definitely improved the distribution of my data but not to the same degree that Lambert W did. I'm sorry if this sounds ignorant, but how else can I compare two means? I want to be able to obtain a confidence interval and if I don't have normality, how can I do that? I can get the error but not a CI (since a 95% CI with 1.96 being the critical value relies on normality) – Grint Aug 24 '17 at 14:41
  • 1
    First off, the t test works pretty well with many non-normal distributions. In your case you can explore that by comparing P-values for original and transformed data. Second, you can treat this as a problem for a generalized linear model with a binary predictor. Often the results are robust to error family and link so long as the sample size is not too small. In your case, the data look nearly symmetric already so you are half-way there already. – Nick Cox Aug 24 '17 at 14:47
  • 2
    Lambert W must be used with caution, because it is not one-to-one. When applied to non-negative numbers it is, *and therefore has an inverse.* (The inverse is simple to express and compute: $w \to w e^w$.) You can apply the inverse to your results in just the same way you might exponentiate the results of a logarithm--and all the same cautions apply. If you wish to interpret the results, then perhaps the best course of action is to develop the same understanding of $W$ as you have of the logarithm. – whuber Aug 24 '17 at 15:02
  • 2
    @whuber: While it is true that Lambert W function is not one-to-one, the "Lambert W transformation" for data with heavy tails is one-to-one (i.e., bijective). I usually use "Lambert W transformation" to refer to the transformation involving -- but not identical to -- the Lambert W function. Agreed that we should develop a better understanding of W similar to logarithm -- unfortunately it's still considered "exotic" or "non-standard" or I also heard "not a real function". But I digress ;) – Georg M. Goerg Mar 24 '18 at 21:35
  • @Gint: you can interpret results by just viewing Lambert W x Gaussian distribution as a distribution (forget the actual data transformation -- view it just as a sanity check that a Lambert W x Gaussian for original data is a good description of the data). In that case, the MLE estimates and std errors tell you that the location = mean of the data is mu = 0.0057 (+/- 0.0028). For a t-test I assume you have another dataset that you want to compare to; then estimate the location/mean there and compare them. – Georg M. Goerg Mar 24 '18 at 21:41

0 Answers0