I keep reading about instances where we center the data (e.g., with regularization or PCA) in order to remove the intercept (as mentioned in this question). I know it's simple, but I'm having a hard time intuitively understanding this. Could someone provide the intuition or a reference I can read?
Asked
Active
Viewed 3.4k times
50
-
2This is a very special case of "controlling for other variables" as explained (in several ways) at http://stats.stackexchange.com/questions/17336/how-exactly-does-one-control-for-other-variables. The "variable" being controlled for is the constant (intercept) term. – whuber Oct 22 '14 at 14:36
1 Answers
85
Can these pictures help?
The first 2 pictures are about regression. Centering the data does not alter the slope of regression line, but it makes intercept equal 0.
The pictures below are about PCA. PCA is a regressional model without intercept$^1$. Thus, principal components inevitably come through the origin. If you forget to center your data, the 1st principal component may pierce the cloud not along the main direction of the cloud, and will be (for statistics purposes) misleading.
$^1$ PCA isn't a regression analysis, of course. It however shares formally same linear equation (linear combination) with linear regression. PCA equation is like linear regression equation without intercept - because PCA is a rotation operation.

ttnphns
- 51,648
- 40
- 253
- 462
-
1Thanks! Follow-up question: in the case of regression, if I'm predicting y for an unseen x, that means I have to add the intercept back in *after* prediction, right? And, the intercept would be equal to $\bar{y} - \bar{X}\beta$? – Alec Feb 06 '12 at 15:17
-
Right. (Meaning "beta" in your formula is b for centered data, not standardized coefficient beta) – ttnphns Feb 06 '12 at 15:38
-
The second example is not correct. PCA is maximizing variance, so in the case of left uncentered data the direction found would be similar to right ("distance" of line from data cloud does not matter). The differences come in higher dimensions, where the data are complex and searching for the optimum direction should then be done from the center of the cloud. – Aug 27 '12 at 10:47
-
21`PCA is maximizing variance` This is not generally true. PCA maximizes (by the 1st PC) sum-of-squared deviations from the origin. Only if the data were preliminary centered (centering itself isn't a part of PCA) it turns to be maximizing variance. – ttnphns Aug 27 '12 at 11:42
-
3P.S. Note that computation of covariances or correlations implies centering – ttnphns Aug 27 '12 at 11:47
-
1> P.S. Note that computation of covariances or correlations implies centering – ttnphns Aug 27 '12 at 11:47 While I agree with your other comments, both covariance and correlation do NOT imply centering. Neither cor nor covar change value when an additive constant is applied to the data. – TPM Oct 30 '14 at 16:13
-
1This is backwards. Additive constants indeed don't affect correlations, but that is because they are subtracted out in the calculations, as @ttphns pointed out. That aside, this isn't a new answer, but a comment. We understand that you don't yet have enough reputation to comment, so this will, I trust, be moved by a user with enough reputation after I flag it. – Nick Cox Oct 30 '14 at 16:19
-
1@ttnphns: Your second figure (about PCA) has one unfortunate drawback: 1st PC on the centered data is exactly orthogonal to the 1st PC on the un-centered data. As a result, it might make an impression that failing to center will only influence the 1st PC, and all the other principal axes will simply get shifted by one (1st become 2nd, etc.). This is of course not so. If your cloud on the left were rotated e.g. clockwise by 60 degrees (and if you also marked 2nd PCs on both subplots), this would become more obvious. – amoeba Dec 15 '14 at 11:09
-
@amoeba, I could have been selecting better data for the example, but who cares? The truth is there. The "impression" you are talking about indeed might mislead somebody sagacious, like you, but a sagacious, like you, will then certainly get rid of the impression. – ttnphns Dec 15 '14 at 11:45
-
-
-
@ttnphns: Yeah for some reason they are all shown as invalid image links. Perhaps it is an issue with SE? – GWW Aug 08 '18 at 15:51
-
@GWW, on my computer and my phone the pics display all right. So I can't say anything to your problem. Maybe you want to call a moderator's attention? You can do it by pressing "flag" below the answer and then expressing your finding. – ttnphns Aug 09 '18 at 17:03
-
@ttnphns: This is bizarre I don't understand why I cannot see them. I have tried on two different computers but they do work on my phone. Oh well, sorry to cause issue. – GWW Aug 10 '18 at 12:33