18

I'm doing a linear regression with a transformed dependent variable. The following transformation was done so that the assumption of normality of residuals would hold. The untransformed dependant variable was negatively skewed, and the following transform made it close to normal:

$$Y=\sqrt{50-Y_{orig}}$$

where $Y_{orig}$ is the dependent variable on the original scale.

I think it makes sense to use some transformation on the $\beta$ coefficients to work our way back to the original scale. Using the following regression equation,

$$Y=\sqrt{50-Y_{orig}}=\alpha+\beta \cdot X$$

and by fixing $X=0$, we have

$$\alpha=\sqrt{50-Y_{orig}}=\sqrt{50-\alpha_{orig}}$$

And finally,

$$\alpha_{orig}=50-\alpha^2$$

Using the same logic, I found

$$\beta_{orig}=\alpha\space(\alpha-2\beta)+\beta^2+\alpha_{orig}-50$$

Now things work very well for a model with 1 or 2 predictors; the back-transformed coefficients resemble the original ones, only now I can trust the standard errors. The problem comes when including an interaction term, such as

$$Y=\alpha+X_1\beta_{X_1}+X_2\beta_{X_2}+X_1X_2\beta_{X_1X_2}$$

Then the back-transformation for the $\beta$s are not so close to the ones from the original scale, and I'm not sure why that happens. I'm also unsure if the formula found for back-transforming a beta coefficient is usable as is for the 3rd $\beta$ (for the interaction term). Before going into crazy algebra, I thought I'd ask for advice...

Macro
  • 40,561
  • 8
  • 143
  • 148
Dominic Comtois
  • 2,047
  • 5
  • 20
  • 25

2 Answers2

19

One problem is that you've written

$$Y=α+β⋅X$$

That is a simple deterministic (i.e. non-random) model. In that case, you could back transform the coefficients on the original scale, since it's just a matter of some simple algebra. But, in usual regression you only have $E(Y|X)=α+β⋅X $ ; you've left the error term out of your model. If transformation from $Y$ back to $Y_{orig}$ is non-linear, you may have a problem since $E\big(f(X)\big)≠f\big(E(X)\big)$, in general. I think that may have to do with the discrepancy you're seeing.

Edit: Note that if the transformation is linear, you can back transform to get estimates of the coefficients on the original scale, since expectation is linear.

Macro
  • 40,561
  • 8
  • 143
  • 148
18

I salute your efforts here, but you're barking up the wrong tree. You don't back transform betas. Your model holds in the transformed data world. If you want to make a prediction, for example, you back transform $\hat{y}_i$, but that's it. Of course, you can also get a prediction interval by computing the high and low limit values, and then back transform them as well, but in no case do you back transform the betas.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 1
    What to make of the fact that the back-transformed coefficients get very close to the ones obtained when modelling the untransformed variable? Doesn't that allow for some inference on the original scale? – Dominic Comtois Apr 25 '12 at 00:53
  • 2
    I don't know, exactly. It could depend any number of things. My first guess is that you're getting lucky w/ your 1st couple of betas, but then your luck runs out. I have to agree w/ @mark999 that "the estimates that we'd get were the original data suited to linear regression" doesn't actually make any sense; I wish it did & it sort of seems to at first blush, but unfortunately it doesn't. And it doesn't license any inferences on the original scale. – gung - Reinstate Monica Apr 25 '12 at 01:36
  • 1
    @gung for non linear transformations (say box cox): I can back transform fitted values as well as prediction intervals, but I can't transform betas nor coefficient intervals for the betas. Is there any additional limitation I should be aware of? btw, this is a very interesting topic, where can I get a better understanding? – mugen Oct 10 '14 at 02:29
  • 2
    @mugen, it's hard to say what else you should be aware of. 1 thing maybe to hold in mind is that the back transformation of y-hat gives you the conditional *median* whereas the un-back-transformed (bleck) y-hat is the conditional mean. Other than that, this material should be covered in a good regression textbook. – gung - Reinstate Monica Oct 10 '14 at 02:32
  • @gung thank you very much for your comment, especially for pointing out that back transformed fitted value is actually the conditional *median*. I have a copy of Kutner here with me, but the coverage of power transform is way too short. – mugen Oct 10 '14 at 02:37
  • 3
    @mugen, you're welcome. Feel free to ask more questions via the normal mechanisms (clicking `ASK QUESTION`); there will be more resources for answering, you will get the attention of more CVers, & the information will be better accessible for posterity. – gung - Reinstate Monica Oct 10 '14 at 13:02