2

Is it possible to carry out normal multiple linear regression when the dependent variable and one predictor variable have been transformed using square root transformation? (as they did not follow normal distribution).

Is there any back transformation necessary for the R2 value, coefficients and confidence intervals?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Joanna
  • 21
  • 2
  • 3
    Linear regression does NOT require that the predictors be normal, nor that the dependent variable be normal, only that the residuals be normal. – Peter Flom Mar 12 '13 at 11:11
  • See http://stats.stackexchange.com/questions/34920/what-kinds-of-variables-should-we-use-the-normality-test-for http://stats.stackexchange.com/questions/45671/normality-of-residuals-vs-sample-data-what-about-t-tests – Glen_b Mar 12 '13 at 11:34
  • I agree w/ @PeterFlom. In that vein, you may find this thread informative: [what-if-residuals-are-normally-distributed-but-y-is-not](http://stats.stackexchange.com/questions/12262/). – gung - Reinstate Monica Mar 14 '13 at 04:51

1 Answers1

2

Only the residuals need to be normally distributed, as @PeterFlom & @Glen_b note in the comments. The linked threads will help you to understand this issue.

If you have transformed your X variable (e.g., adding a squared term), nothing much really happens. Everything is fine with using and interpreting your model as is.

On the other hand, if you have transformed your Y variable, people often want to know what a predicted value will be in terms of the 'regular' Y dimension. To do this properly, you calculate a predicted y value, and back transform it. You can also calculate upper and lower confidence bounds, and back transform them. However, you do not back transform your betas / coefficients (cf., my answer here). Also, you may interpret $R^2$ as is, there is no transforming or back transforming $R^2$.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 1
    Isn't the re-transformation problem more involved than simply back-transforming the prediction? For example, with the natural log transformation, $E[y_i \vert x_i]=\exp (x_i'\beta) \cdot E[\exp (u_i)]$? – dimitriy Mar 16 '13 at 17:53
  • I'm not sure what you're getting at, @DimitriyV.Masterov. If you are worried about whether your assumptions are met (eg, the variance scales w/ the mean &/or the residuals are skewed), & you transform $Y$ st, eg, $$\ln(Y)=\beta_0+\beta_1X+\varepsilon\\ \text{where }\varepsilon\sim\mathcal N(0,\sigma^2)$$, then you can get the predicted value at $x_i$ on the original $Y$ scale by $\exp(\widehat{\ln(y_i)})$, but you certainly wouldn't use $\exp(\beta_0)+\exp(\beta_1)x_i$. – gung - Reinstate Monica Mar 16 '13 at 20:18
  • 1
    I am worried that $\exp \{E[\ln y]\}\ne E[y].$ Under your normality assumptions, to get prediction on the un-logged scale you would need to multiply $\exp \{\ln(\hat y_i)\}$ by $E[\exp \{u_i\}] \approx \exp \{\frac{\hat \sigma^2}{2}\},$ where $\hat \sigma^2$ is the unbiased estimator of the log-linear regression model error. – dimitriy Mar 16 '13 at 23:41
  • I still don't get the upshot here, @DimitriyV.Masterov. If the regression assumptions aren't met on the original $Y$ scale, we don't want $E[y_i|x_i]$, so it doesn't matter that $E[\ln(y_i)|x_i]\ne E[y_i|x_i]$. Just to double check, I just looked this up in Neter (1996), where it says, "If it is desired to express the estimated regression function in the original units of $Y$, we simply take the antilog of $\hat Y'$... " (p. 132). – gung - Reinstate Monica Mar 17 '13 at 00:15
  • Please forgive me if I am being dense here. If I exponentiate your first equation, I get $Y=\exp \{\beta_0+\beta_1 X\}\cdot exp\{\varepsilon\}.$ Taking the expectation in the homoskedastic case, I get $E[Y \vert X]=\exp \{\beta_0+\beta_1 X\} \cdot E[exp\{\varepsilon\}].$ Doesn't the antilog of $\hat Y$ ignore the second term? – dimitriy Mar 17 '13 at 00:47
  • I'm sure it's me who's being dense here, @DimitriyV.Masterov (& it wouldn't be the first time... ). Setting aside whether $y_i$ has been transformed, we calculate $\hat y_i$ as $\beta_0+\beta_1x_i$; that is, we set $\varepsilon=0$, b/c $E[\varepsilon]=0$. Now $\exp(0)=1$, so we can multiply the first part, $\exp\{\beta_0+\beta_1X\}$ by 1, if you'd like. Of course, it won't change anything. So (w/ apologies), I still don't understand what you're getting at. – gung - Reinstate Monica Mar 17 '13 at 01:07
  • 1
    @DimitriyV.Masterov is correct, see a pretty straight forward discussion here on the Stata blog, [Use poisson rather than regress; tell a friend](http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/). The expectation of the exponentiated error term *is not one*. – Andy W Mar 17 '13 at 14:06