1

I understand residuals intuitively in terms of linear regression as "the error in prediction". Mathematically I've seen residuals given by

$$\epsilon = y - \hat{y}$$

where $y$ is the true value and $\hat{y}$ is the estimated value.

But what are residuals with respect to PCA? Once we apply PCA to a matrix say, we are reducing its dimension by finding the directions with the most variation and projecting the data onto these directions (the principal components). I would guess that the residual here is the amount of variation left "unexplained" by the dimensionality reduction. But I haven't seen a formal definition for this so I can't be sure. Is there a more formal definition or better intuitive understanding to PCA residuals?

PyRsquared
  • 1,084
  • 2
  • 9
  • 20
  • 1
    Your formula, which looks quite formal to me, works just fine: each observation in a PCA is a $p$-vector $y$ and if you select $d$ principal components it projects onto another $p$-vector $\hat y$ lying within the span of those components. This makes $\epsilon$ a $p$-vector that lies in the $p-d$-dimensional orthogonal complement of that span. – whuber Mar 28 '19 at 21:18

0 Answers0