5

I almost always used Numpy's StandardScaler to normalize my data for machine learning. I noticed however that simply taking the log of the variables that I wanted to normalize often resulted in better accuracy compared to when I used the StandardScaler method.

To give some more context, I built several binary classifiers for different purposes both with ANNs and XGboost and I noticed that log-normalizing the data always leads to better accuracy.

I'm a little puzzled by this as nobody ever mentions log-normalization as a valid normalization technique. Everyone talks about min-max normalization and Z-score/Numpy's StandardScaler but no one even mentions log-normalization.

How is that possible? Am I doing something wrong?

  • 1
    Taking the logarithm is called a log *transformation* and is within the broader category of [power transforms](https://en.wikipedia.org/wiki/Power_transform). I wouldn't call it normalization since typically when I see the word "normalization" I think of scaling (and perhaps a shift)? – Matthew Gunn Sep 08 '17 at 20:55
  • Sure, I get what you're saying but you can't ignore the fact that you are also changing the scale of your log-transformed variables. I'm just surprised by how well it works with models that use binary cross entropy and by the fact that no one talks about it – Giulio Giorcelli Sep 08 '17 at 21:23
  • I'm not sure if anything deep is going on besides that some relationships are closer to log linear than linear? For example, the fraction of firms repurchasing shares increases almost linearly with the logarithm of firm size. Fit a linear relationship between repurchasing activity and firm size and it won't work as well. Given the commonality of those types of log linear relationships, I appreciate what you're saying though that it perhaps should be emphasized more? (You also get into subject area expertise whether to work, based on theory, in levels or logs.) – Matthew Gunn Sep 08 '17 at 23:23
  • Thanks for getting back to me on this. I changed the title hoping to get more answers/comments for this question. In the meantime, I'll run some experiments to test my hypothesis. – Giulio Giorcelli Sep 10 '17 at 01:26
  • How much better is the accuracy? – sjw Sep 10 '17 at 01:43
  • So as of today I have developed two models with log-transformation that are fully productionalized. The first one was half percentage point more accurate than its Z-score equivalent on test data. The second one was 3 tenths of a percentage point more accurate. I'm working on a third one right now that has shown almost seven tenths of a percentage point improvement over Z-score – Giulio Giorcelli Sep 11 '17 at 17:22

1 Answers1

4

It is quite often to use the log transformation on your data, if your data are always positive (e.g. the price of something) and their scales varies drastically.

A simple criterion of whether you should use log transformation is whether you want to use a linear or log scale for your x-axis when you are plotting the histogram of your data.

This is likely to make your ANN work better if your data indeed look that way, because of one reason: Remember the motivation of batch normalization - ANN likes to have a standard normal distribution. You can make your distribution zero-centered with unit variance, but that does not make your distribution into a normal distribution, but your distribution might look more like a normal distribution if you use log transformation. You can check whether this is true from the histogram or the Kurtosis of your distribution.

DiveIntoML
  • 1,583
  • 1
  • 11
  • 21
  • Makes sense, thanks for answering. So, do you think it's safe to say that as long as your data does not have negative values log-transformation should be preferred over Z-score normalization? – Giulio Giorcelli Sep 11 '17 at 17:24
  • @GiulioGiorcelli I would not say it like that, but you are always encouraged to plot the histogram and try the log transformation if your data are always positive. – DiveIntoML Sep 11 '17 at 21:05
  • It seems to me that, as @MatthewGunn said above, it is important to consider these separate processes that accomplish different goals. If the technique you are applying assumes normality and you have log-normal data, then taking the log would be wise. But, many techniques will require your features to all be on similar scales, so you should also normalize. The order should be transformation -> normalization I think. Normalization also assumes normality. Good example would be PCA, which 'requires' both (normality not actually assumed, but correlation matrix is). – neuroguy123 Sep 10 '18 at 01:34
  • Do we still need the min-max or z-score normalization after the log transformation? – Wenmin Wu Nov 24 '20 at 05:58