0

I have a skew variable and I want to use box-cox transformation (log) but around 50 observation out of 200 are between 0 and 1.

My initial thought was to add a constant to all observations before taking the log. Is it a good idea?

If I do add this constant, how can I determine the back transformation function?

user77876
  • 886
  • 6
  • 19
user178953
  • 63
  • 1
  • 5
  • The main question ("is it a good idea?") is a little too vague to be answerable--it doesn't explain how or why you will be analyzing this variable, and that information is essential for determining how to treat it. If the apparent duplicates don't answer the question you have in mind, then please edit your post to clarify it. – whuber Oct 03 '17 at 16:39

1 Answers1

0

Suppose you have a variable x which is distributed on [0;1]. If you want to log-transform it, take new_x == log(x + 1) which will produce zero when xx is zero. When transforming back to original scale, use exp(new_x) - 1.

Alexey Burnakov
  • 2,469
  • 11
  • 23