I am trying to understand how to back-transform in R. Let's say I have X and Y values:
X<-c(1,2,3,4,5,6,7,8,9)
Y<-c(14,18,17,22,8,10,5,6,10)
Now let's say that I want to transform the Y to square-root (i.e. sqrt(Y)), and X to log and then run a simple linear regression model. I understand with such a small sample set, such transformations are not ideal to achieve linearity, but I just wanted to illustrate my question by using these specific transformations. Anyways, the way I understood back-transforming was to obtain Y in its original units, but I did not understand how to account for residual variance (or Mean Squared Error). Here is the code that I used to avoid biased confidence intervals:
logX<-log(X)
sqrtY<-sqrt(Y)
hist(sqrtX)
Summary<-lm(sqrtY ~ logX)
Sigma<-summary(Summary)$sigma^2
Sigma
NewX<-(logX)^2
NewX
lm(Y ~ NewX)
Back.X<-(logX)^2
NewX<-Sigma + NewX
Back.lm<-lm(Y ~ NewX)
I know that diagnostic plots such as residuals vs. fitted are the best way to assess the improvement of a model after back-transforming Y, but is the code above a valid method to account for the mean squared error?
Reply to Moose (2/13/2014): What I am trying to say do is avoid retransformation bias, so my question is:
"When I backtransform in R from the given example, do I either only square the explanatory variable", or
"Do I need to otherwise not only square the explanatory variable, but also add the estimated residual variance (i.e. object 'NewX') from the object 'Summary' to the new model 'Back.lm'?
There could be another explanation that I have not suggested, but I couldn't think of any other solutions.