4

Assume that I have a log transformed model as follows:

Model 1: $Y = a + b\ln(X)$. Interpretation: a 1% increase in $X$ is associated with an average $b/100$ units increase in $Y$.

If I add $1$ to $X$ to avoid having $0$ values and get:

Model 2: $Y = c + d\ln(X+1)$

Should I interpret the model as "a 1% increase in $(X+1)$ is associated with an average $d/100$ units increase in $Y$?" Or there are some better ways to interpret the model? Thanks.

Guess Gucci
  • 515
  • 6
  • 12

2 Answers2

2

You could, but it's not a very intuitive thing. If $x$ is 0.01, then a 1% increase in $(1+x)$ is basically a doubling of $x$, while if $x$ is 100, it's close to a 1% increase in $x$.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
1

The interpretation of the model should depend partly on the range of values of $x$ and on how it is to be applied. If most of the $x$ values are large, say more than 100, and if it is to be used to predict $y$ corresponding to such large values of $x$, then a good approximate interpretation would be: a 1% increase in $x$ is associated with an average $d/100$ units increase in $y$. For $x > 100$ the proportionate difference between $x+1$ and $x$ is small.

If however the model is to be used to predict $y$ corresponding to small values of $x$ then this approximation would be unhelpful and your interpretation would be more appropriate, although as Glen_b says it's not very intuitive.

If most of the $x$ values are small, then a better way to avoid the zeroes might be to add a different constant, much less than 1.

Adam Bailey
  • 1,602
  • 11
  • 20
  • 1
    I agree on disliking log(x + 1) as a transformation on various grounds, but the last suggestion here is dubious, if not dangerous. ln(zero + epsilon) may sound more conservative than ln(zero + 1), but the smaller epsilon is, the larger the negative logarithm created. Using log 10 so the numbers are easy, log10(1/1000) = -3, log10(1/1000000) = -6. log(x + epsilon) thus is all too likely to create outliers out of zeros, and outliers whose values depend crucially on an arbitrary choice of epsilon. This is why 1 was suggested in the first place, presumably. – Nick Cox Mar 15 '13 at 14:07
  • @NickCox I only said 'might'! However you are right to highlight that the choice of a constant to add to avoid zeroes is not straightforward. As you show, it isn't a case of the smaller the better. The constant should be chosen with regard to the range of the $x$ values and there is nothing special about 1. If, say, the range was between 0 and 1 then adding 0.1 would be better than adding 1. A previous question which addresses this issue in detail is http://stats.stackexchange.com/questions/30728/how-small-a-quantity-should-be-added-to-x-to-avoid-taking-the-log-of-zero. – Adam Bailey Mar 15 '13 at 20:00