Is a linear correlation between logs useful for making predictions with a regression model?

Question

As I mentioned in a previous question, I have two variables which seem to show a strong linear relationship between the logs of the two variables, even if there is no clear relationship between them when they are on their normal scale. And I thought that this would allow me to make prediction about one using the other.

When I actually tried, I realized that I was getting pretty good values for the log-log model, but small errors in the log prediction turn into significant errors when reverse transformed into their original scale.

Is there any way around this, or is a log-log linear relationship not really useful for prediction purposes?

How did you reverse transform into original scale? Simply exponentiate the predicted values? — AlexK, Apr 11 '19 at 23:42

score 2 · Accepted Answer · answered Apr 12 '19 at 01:33

This may or may not make your predictions a whole lot better on the linear scale, but this is just to inform you that simply exponentiating the predicted value for $log(y)$ is considered a naive method and will systematically underestimate the expected value of $y$ and that there are more appropriate methods out there. There is a good description of one of them (or two, they are just similar) (and example) in Wooldridge's Introductory Econometrics, 4e, pp. 210-213 (which can be accessed at http://www.clementnedoncelle.eu/wp-content/uploads/IntroductoryEconometrics_AModernApproach_FourthEdition_Jeffrey_Wooldridge.pdf)

The process has three steps (there are four in the book, and this is just simplified):

Obtain predicted values, $\widehat{log y_{i}}$, from the OLS-estimated log-log regression, and exponentiate those predicted values
Regress $y_{i}$ (dependent variable on linear scale) on the exponentiated predicted values obtained in step 1, using OLS and without including an intercept/constant term in the model
Multiply the $\beta$ coefficient estimate from the regression in step 2 by each of the predicted values from the log-log regression to obtain your final predicted values

The estimate from step 2 above can be replaced with this:

$$ n^{-1}\sum_{i=1}^{n}exp(\hat{u_{i}}) $$

where $\hat{u_{i}}$ are estimated residuals/errors from the regression in step 1.

There is also this answer that has a link to a blog with alternative solutions: making predictions with log-log regression model

Thanks. Makes sense, except for I don't completely grasp how (2) and $$ n^{-1}\sum_{i=1}^{n}exp(\hat{u_{i}}) $$ are equivalent (or how one is an approximation of the other) ? — Akaike's Children, Apr 12 '19 at 04:04
They are just two different methods of estimating an extra term that allows you to get from exponentiated predicted values from the log-log (or log-linear) regression to linear predicted values. This extra term is necessary because the expected value of $y$ is not just the exponentiated form of predictions from the log-log model. — AlexK, Apr 12 '19 at 04:34

Is a linear correlation between logs useful for making predictions with a regression model?

1 Answers1