8

Suppose that I run a linear regression

y = x*b + error,

and obtain predictions y_p. Furthermore, assume that I can compute the R-squared by calling a function R(y, y_p) which has two arguments: a vector of observations and a vector of predictions.

Now, I run the linear regression for a transformed response

ln(y) = x*b + error,

and obtain predictions y_ln_p. How do I compute the R-squared for the sake of fair comparison with the previous model? Which one of the two would be better:

a) R(ln(y), y_ln_p)

b) R(y, e^y_ln_p)

Thank you!

NB: ln is the natural log.

user77571
  • 181
  • 1
  • 4

1 Answers1

12

It is not appropriate to compare linear regression models in terms of their summary/fit statistics (RMSE and $R^2$), when some models' dependent variables were transformed so that units changed, as summary statistics are not comparable. Consider the following nice explanation by Maddala (1988, p. 177):

When comparing the linear with the log-linear forms, we cannot compare the R-squared's because R-squared is the ratio of explained variance to the total variance and the variances of y and log y are different. Comparing R-squared's in this case is like comparing two individuals, A and B, where A eats 65% of a carrot cake and B eats 70% of a strawberry cake. The comparison does not make sense because there are two different cakes.

In order to compensate for the scale change, traditionally, people revert the log transformation back to the original scale by using so-called back-transformation method (see this page for more details, explanation and examples).

In regard to the information theory-based model statistics, such as AIC/BIC, in general it is not possible to use them to compare non-transformed and transformed models (see this and this. It is, however, possible to compare AIC with a modified AIC (not sure about BIC), as discussed here and here.

One additional - and the final - note is that it is usually preferred to use adjusted $R^2$ instead of the standard one. Please see my relevant answer and links provided there.

References

Maddala, G. S. (1988). Introduction to econometrics. New York: Macmillan Publishing

Aleksandr Blekh
  • 7,867
  • 2
  • 27
  • 93
  • Many thanks Aleksandr! I greatly appreciate your detailed answer. Bolshoe spasibo. –  Jan 11 '15 at 03:45
  • @user77571: You're very welcome. It is a pleasure and it was an educational experience for me as well. Ne za chto :-). – Aleksandr Blekh Jan 11 '15 at 04:19