5

Lately, there have been numerous questions about normalization

What are some of the situations where you never ever ever should normalize your data, and what are the alternatives?

dassouki
  • 1,219
  • 1
  • 17
  • 25

3 Answers3

4

Of course one should never try to blindly normalize data if the data does not follow a (single) normal distribution.

For example one might want to rescale observables $X$ to all be normal with $(X-\mu)/\sigma$, but this can only work if the data is normal and if both $\mu$ and $\sigma$ are the same for all data points (e.g. $\sigma$ doesn't depend on $\mu$ in a particular $X$ range).

Rob Hyndman
  • 51,928
  • 23
  • 126
  • 178
Benjamin Bannier
  • 688
  • 6
  • 10
3

Whether one can normalize a non-normal data set depends on the application. For example, data normalization is required for many statistical tests (i.e. calculating a z-score, t-score, etc.) Some tests are more prone to failure when normalizing non-normal data, while some are more resistant ("robust" tests).

One less-robust statistic is the mean, which is sensitive to outliers (i.e. non-normal data). Alternatively, the median is less sensitive to outliers (and therefore more robust).

A great example of non-normal data when many statistics fail is bi-modally distributed data. Because of this, it's always good practice to visualize your data as a frequency distribution (or even better, test for normality!)

wormbuff
  • 46
  • 1
2

I thought this was too obvious, until I saw this question!

When you normalise data, make sure you always have access to the raw data after normalisation. Of course, you could break this rule if you have a good reason, e.g. storage.

csgillespie
  • 11,849
  • 9
  • 56
  • 85