0

I am using Stata 13 to estimate a simple regression. Given a rather positive skew of a few of my covariates, I figured to ln-transform the variables. However, I have a substantial amount of zeroes in the covariates. Ln-transforming the variables thus leads to many missings.

I came across several ways of handling the issue:

  1. Replacing zeroes to a small value (e.g. 0.00000001)
  2. Nottreating the issue
  3. Dummying the zeroes in a new variable

I do not like option 1 and 2. Replacing feels somewhat random and just wrong. Not treating the issue results in missing information. I thus prefer option 3. But it does not seem to work for me so far. Here is an example of what I do.

clear
clear matrix
set more off

sysuse nlsw88

hist tenure

gen ln_tenure=ln(tenure)

gen null_tenure = 1 if ln_tenure==. & tenure!=.


reg wage grade tenure south
reg wage grade ln_tenure south

reg wage grade ln_tenure null_tenure south

The nlsw88 example dataset provides us with 51 observation with tenure=0. Simply regressing wage on grade with tenure is hence based on 51 more observations compared to regressing wage on grade and ln_tenure.

To not miss out on the information at tenure=0, I created the dummy null_tenure=1 for all tenure=0. Obviously, null_tenure gets omitted when introducing it to the regression.

I have two questions:

  1. Does this way of handling missings created by ln-transforming data make sense?
  2. If so, How can I circumvent the dummy to be omitted ?

/R

Rachel
  • 227
  • 6
  • 19
  • 1
    One common fallacy should be mentioned straight away. Using extremely small positive constants rather than zero, rather than being a very conservative change to the data, is a drastic change to the data. This is marginally easier to see with logs base 10: a constant $10^{-6}$ becomes $-6$ on log 10 scale, one $10^{-9}$ becomes $-9$ on log scale; in short, the smaller the constant, the bigger the negative outliers created and on log scale too! – Nick Cox May 26 '15 at 09:25
  • Exactly! Merely replacing is not an option (allthough it seems to be regularly done). – Rachel May 26 '15 at 09:27
  • My point was that this particular replacement is unsound, but as the cited thread indicates there are other replacements that preserve information. – Nick Cox May 26 '15 at 09:28
  • @NickCox, thanks for the thread which if course is very close to what I am asking. In fact, it answers question (1) of this thread. Answering the more technical question (2) however remains troublesome for me. whuber suggested to generate a second variable that takes the value 1 if the ln-transformed variable was initally zero (see here http://stats.stackexchange.com/a/1795/77419) - but how can I prevent the dummy variable of being omitted? – Rachel May 26 '15 at 09:49
  • 1 if initially zero and 0 otherwise: what's the problem? – Nick Cox May 26 '15 at 10:00
  • The dummy gets omitted. `clear sysuse nlsw88 gen ln_tenure=ln(tenure) gen null_tenure = 0 replace null_tenure=1 if tenure==0 reg wage grade ln_tenure null_tenure` – Rachel May 26 '15 at 10:03
  • The question now seems to ne that when you try to implement that procedure (in Stata?) it doesn't work for you. I suggest that is (a) difficult to answer without worked example and code and error report (b) off-topic here if you did that; Statalist is now the better forum for that. If you think @whuber's suggestion was incorrect in principle, then you should be raising that in the original thread. – Nick Cox May 26 '15 at 10:06
  • Stata code is not universally intelligible or reproducible here! Please note my previous comment, which crossed with yours. Nevertheless your error is simple: you must replace the zeros in `tenure` with 1s before transformation by `ln()`. Ask on Statalist and I or others will expand. – Nick Cox May 26 '15 at 10:08
  • Thanks! I cross-posted the question with a more technical focus in Statalist http://www.statalist.org/forums/forum/general-stata-discussion/general/1295600-how-to-create-a-dummy-that-indicates-original-zeroes-before-ln-transformation – Rachel May 26 '15 at 10:29

0 Answers0