2

I found this preprocessor where they had the data log transformed with $log_{10}(100+data)$. I am not sure what this log transformation exactly does. Any suggestions?

rajan sthapit
  • 345
  • 1
  • 4
  • 12
  • Closely related: [How small a quantity should be added to x to avoid taking the log of zero?](http://stats.stackexchange.com/questions/30728). – whuber Mar 07 '13 at 08:56

2 Answers2

2

Log is a common transformation often used to make a variable more linearly associated with what you want to model.

This does the same - and adding 100 does three things -

1) ensures any non-negative value (in fact anything > -100) will have a defined transformation, and

2) controls how much the transformed variable's shape will change. More you increase the parameter 100, more will the transformed variable look like the original variable (up to linear transformations).

3) Most likely the reason this transformation is chosen is also that the transformed variable is more correlated with the target (for OLS), or the log-odds of the target (for logistic regression). For tree based techniques like Random Forest, this kind of transformations are unnecessary.

KalEl
  • 559
  • 3
  • 11
0

I'm assuming that's meant to be $log_{10}(100+data)$. Then, a possible reason for this transformation is to shift the data so that all of the data is above a certain value (definitely above 0), so that the log transformation can be used effectively.

RS18
  • 108
  • 1
  • 9