In a log-linear model of an outcome $\ln y$ with a continuous untransformed explanatory variable $x$ and a dummy explanatory variable $d$:
- $100 \cdot \beta_x$ is the percentage change in $y$ for a small change in $x$ (up or down)
- If d switches from 0 to 1, the percent change in $y$ is $100 \cdot [\exp(\beta_d) - 1]$.
- If d switches from 1 to 0, the percent change in $y$ is $100 \cdot [\exp(-\beta_d)-1]$
Personally, I find this semi-elasticity interpretation much easier to follow than a multiplicative effect on the geometric baseline mean (exponentiated intercept) for the dummy variable, and the ratio of $\frac{\mathbf E[y \vert x+1]}{\mathbf E[y \vert x]}=\exp \beta_x$. If $y$ was a ratio, maybe this would make more sense.
For the graphs, you can plot two lines of re-transformed $y$ against x, one with $d=1$ and one with $d=0$:
\begin{equation}E[y \vert x]=\exp (\alpha +\beta_x \cdot x +\beta_d \cdot d) \cdot E[\exp (u)].\end{equation}
The second part of this expression is the hard part. If we assume normality and independence, we can approximate the second term with $\exp (\frac{\hat \sigma^2}{2}),$ where we use the RMSE from the logged regression for the unobserved $\sigma$. Or we can use a weaker assumption of $iid$ on $u_i$, and use the sample average of the exponentiated residuals from the logged model for the second term. That's the Duan "smearing" approach. It might make sense to take two averages: one for the $d=1$ observations and one for $d=0$ if you have reasons to believe there's heteroskedacity across the two groups.
Finally, all this re-transformation nonsense can also be avoided by using a GLM model.
Here's an example using Stata:
. sysuse auto, clear
(1978 Automobile Data)
. gen lnp=ln(price)
. reg lnp i.foreign mpg
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 2, 71) = 17.80
Model | 3.74819416 2 1.87409708 Prob > F = 0.0000
Residual | 7.47533892 71 .105286464 R-squared = 0.3340
-------------+------------------------------ Adj R-squared = 0.3152
Total | 11.2235331 73 .153747029 Root MSE = .32448
------------------------------------------------------------------------------
lnp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
Foreign | .2824445 .0897634 3.15 0.002 .1034612 .4614277
mpg | -.0421151 .0071399 -5.90 0.000 -.0563517 -.0278785
_cons | 9.4536 .1485422 63.64 0.000 9.157415 9.749785
------------------------------------------------------------------------------
The foreign price premium is 32% and significant:
. nlcom 100*(exp(_b[1.foreign])-1)
_nl_1: 100*(exp(_b[1.foreign])-1)
------------------------------------------------------------------------------
lnp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_nl_1 | 32.63681 11.90594 2.74 0.006 9.301603 55.97202
------------------------------------------------------------------------------
Here are the exponentiated coefficients:
. reg, eform(b)
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 2, 71) = 17.80
Model | 3.74819416 2 1.87409708 Prob > F = 0.0000
Residual | 7.47533892 71 .105286464 R-squared = 0.3340
-------------+------------------------------ Adj R-squared = 0.3152
Total | 11.2235331 73 .153747029 Root MSE = .32448
------------------------------------------------------------------------------
lnp | b Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
Foreign | 1.326368 .1190594 3.15 0.002 1.109003 1.586337
mpg | .9587594 .0068455 -5.90 0.000 .9452067 .9725066
_cons | 12754 1894.507 63.64 0.000 9484.509 17150.53
------------------------------------------------------------------------------
The foreign premium is just about identical. The geometric mean price for domestic cars seems pretty high to me, but that is because we are conditioning on mileage (these are the Caddies and Lincolns and Mercuries). Now we implement the Duan's re-transformations approach by hand:
. predict double uhat, residual
. predict double lnyhat, xb
. gen double expuhat = exp(uhat)
. sum expuhat, meanonly
. gen double yhat = r(mean)*exp(lnyhat)
You can also use Chris Baum's levpredict
:
. /* Make Sure I Did Things Right */
. levpredict yhat2, duan
. compare yhat yhat2
---------- difference ----------
count minimum average maximum
------------------------------------------------------------------------
yhat=yhat2 74
----------
jointly defined 74 0 0 0
----------
total 74
Now for the graph code:
. tw ///
> (line yhat mpg if foreign ==1, sort lcolor(green)) ///
> (line yhat mpg if foreign ==0, sort lcolor(orange)) ///
> (scatter price mpg if foreign==1, mcolor(green) msymbol(Oh) jitter(2)) ///
> (scatter price mpg if foreign==0, mcolor(orange) msymbol(Oh) jitter(2)) ///
> ,legend(label(1 "E[Price|Foreign]") label(2 "E[Price|Domestic]") label(3 "Foreign") label(4 "Domestic") rows(1)) ///
> ytitle("Dollars") title("Duan Smearing In Action") ///
> ylab(, angle(horizontal) format(%9.0fc)) plotregion(fcolor(white) lcolor(white)) graphregion(fcolor(white) lcolor(white)) ///
>
Looks reasonable:
