I often see people making a dimension/feature of a dataset be of a zero-mean by removing the mean from all the elements. But I never I understood why to do so? What is the effect of doing that as a preprocessing step? Does it improve the classification performance? Does it help to answer something about the dataset? Does it help when doing a visualization to understand the data?
-
9This approach is called [centering](http://stats.stackexchange.com/search?q=centering). One of its applications is to turn the regression model's intercept into "predicted y when x is at average," making the intercept a bit more interpretable. – Penguin_Knight Jun 24 '14 at 12:40
-
A centered feature/dataset can also said to be *well-conditioned*. See [here](https://youtu.be/WaHQ9-UXIIg?t=25s) for a visual explenation. The operation of normalizing input makes the gradient descent much easier. – tuned Feb 24 '17 at 15:45
3 Answers
Some cases where "centering the data on its mean" (hereafter just "de-meaning") is useful:
1) Visual detection of whether a distribution is "the same" as another distribution, only, it has been shifted on the real line. Making both distributions having zero-mean, makes this visual inspection much more easy. Sometimes, if the mean value differs by much, viewing them on the same chart is impractical. Think of two normal r.v.'s, say a $N(10,4)$ and a $N(100,4)$. The shapes of the density graphs are identical, only their position on the real line differs. Now imagine that you have the graphs of their density functions, but you don't know their variance. De-meaning them will superimpose the one graph over the other.
2) Simplify calculations of higher moments: although adding a constant to a random variable does not change its variance, or its covariance with another random variable, still, if you have a non-zero mean, and you must write out the detailed calculations, you have to write all the terms and show that they cancel out. If the variables are de-meaned, you save a lot of useless calculations.
3) Random variables centered on their mean are the subject matter of the Central Limit Theorem
4) Deviations from the "average value" are in many cases the issue of interest, and whether they tend to be "above or below average", rather than the actual values of the random variables. "Translating" (visually and/or computationally) deviations below the mean as negative values and deviations above the mean as positive values, makes the message clearer and stronger.
For more in-depth discussions, see also
Centering data in multiple regression
If you search "centered data" on CV, you will also find other interesting posts.

- 52,923
- 5
- 131
- 241
Also, for practical reasons, it is advantageous to center the data, for example, when training neural networks.
The idea is that to train a neural network one needs to solve a non-convex optimization problem using some gradient based approach. The gradients are calculated by means of backpropagation. Now, these gradients depend on the inputs, and centering the data removes possible bias in the gradients.
Concretely, a non-zero mean is reflected a in large eigenvalue which means that the gradients tend to be bigger in one direction than others (bias) thus slowing the convergence process, eventually leading to worse solutions.

- 12,986
- 1
- 34
- 64
To add to what Alecos said, which is very good, centering your data at zero is extremely important when using Bayesian statistics or regularization, since otherwise the data can be correlated with the intercept, which makes regularization not do what you usually want.
Making the data zero mean can diminish many off-diagonal terms of the covariance matrix, so it makes the data more easily interpretable, and the coefficients more directly meaningful, since each coefficient is applying more primarily to that factor, and acting less through correlation with other factors.

- 1,121
- 7
- 11