Linear regression with log dependent variable

Question

I have the following regression:

$log(Y) = \alpha + \beta X + \epsilon$

with $E[\epsilon] = 0$ and $var(\epsilon) = \sigma^2$. There is no assumption on the distribution of the errors $\epsilon$. In other words, we cannot assume the errors are normal.

Now, I know that:

$\widehat{log(Y)} = \hat{\alpha} + \hat{\beta} X$.

I would like to find the expression of the predicted value in the original scale, $\hat{Y}$.

I tried second-order taylor expansion (around the mean) as follows:

$E[g(Z)] = g(\mu) + \frac{1}{2}g''(\mu)\sigma^2$

where $\mu$ and $\sigma^2$ are the mean and variance of $Z$.

In this case: let $Z = log(Y)$, then $Y = e^Z = g(Z)$. Then,

$\hat{Y} = E[Y|X] = E[e^Z | X] = g(\mu) + \frac{1}{2}g''(\mu)\sigma^2$

$= e^{\mu} + \frac{1}{2}e^{\mu}\sigma^2$

where $\mu = \hat{\alpha} + \hat{\beta} X$ and $\sigma^2$ is the variance of the residuals.

However, someone told me that regardless of the distribution of the error, the right result should be:

$\hat{Y} = e^{\mu + \sigma^2/2}$

Can anyone please help me figure this one out? Thank you!

See: http://stats.stackexchange.com/questions/124782/solving-a-regression-equation/124784#124784 — dimitriy, Nov 21 '14 at 20:01
Thank you Dimitriy, but there is no derivation on that page. Not really what I am looking for. I am looking for the full proof.Additionally, the link is stating the case when $log(Y)$ is normal. As I said, this is not the case here. — Mayou, Nov 21 '14 at 20:03
The answer links to a description of the Duan (1983) smearing approach, with a citation of the JASA paper containing the proofs, that is for homoskedastic, iid errors that need not be normal. The link also describes how to relax the homoskedasticity assumption. Note that the assumptions are about the errors, not the distribution of Y. — dimitriy, Nov 21 '14 at 21:39
Oh I see, thank you. I will refer to that paper directly. Small follow-up question: how do we express the MSE of the original $\hat{Y}$ in the case of normality, or in the case or unknown distribution? Is there a closed-form formula? — Mayou, Nov 21 '14 at 21:40
This [paper](http://www.jstor.org/discover/10.2307/2288126?uid=3737664&uid=2&uid=4&sid=21104618127781) might help. — Andre Silva, Nov 21 '14 at 21:40
I told you no such thing. I said the result held as an approximation. — Arthur B., Nov 21 '14 at 22:32
Also note that $e^{\mu + \sigma^2/2} = e^{\mu}(1 + \sigma^2/2 + O(\sigma^3))$ — Arthur B., Nov 21 '14 at 22:37
@Arthur B. Yes, Arthur, sorry that is what I meant :) an approximation :) thanks for following up. — Mayou, Nov 23 '14 at 00:11
@ArthurB. Do you mean that $e^{\sigma^2/2} = 1 + \sigma^2/2 + O(\sigma^3)$? I don't think I have seen this approximation anywhere — Mayou, Nov 24 '14 at 15:11
Actually I meant $O(\sigma^4)$ but yeah. And it's the Taylor expansion of the exponential? — Arthur B., Nov 24 '14 at 15:16
Oh ok, with $O(\sigma^4)$ it now makes sense as: $e^x = 1 + x + O(x^2)$. With $x = \sigma^2$ in this case. Is that right? But I am not sure how get to this approximation from $f(\mu) + f''(\mu) \sigma^2/2$. Where does the $O(\sigma^4)$ come into play here? — Mayou, Nov 24 '14 at 15:17

Linear regression with log dependent variable

0 Answers0