Intuition behind projection matrix

Question

I'm new to machine learning and came across projection matrix . In a random thread it was interpreted as

The matrix $X(X^\text{T} X)^{-1} X^\text{T}$ is a projection matrix, as it does precisely that: it is the linear transformation that orthogonally projects a vector onto the span of the vectors comprising $X$.

I'm having hard time to understand the underlying concept behind this and how is this related to linear regression .Some intuition and use cases in machine learning and deep learning would help.

And Ik taking inverse is undoing the process but what does this means here?

Could you clarify *which* underlying concept you are trying to understand? The quotation refers to no fewer than six concepts: matrix, projection, linear transformation, orthogonality, vector, and span! The topic you refer to can be researched by searching our site for [regression projection matrix](https://stats.stackexchange.com/search?tab=votes&q=regression%20projection%20matrix). — whuber, Sep 20 '20 at 21:30
I understand basic terminologies , but I wanted to understand maths as well as intuition behind orthogonality, projections and how is that leads to linear regression. — offset-null1, Sep 20 '20 at 21:36
A lot of similar posts, see [this list](https://www.google.com/search?q=site%3Ahttps%3A%2F%2Fstats.stackexchange.com+least+squares+as+projection&oq=site%3Ahttps%3A%2F%2Fstats.stackexchange.com++least+squares+as+projection&aqs=chrome..69i57j69i58.11993j0j4&client=ubuntu&sourceid=chrome&ie=UTF-8) — kjetil b halvorsen, Sep 21 '20 at 12:43

G. Grothendieck · Accepted Answer · 2020-09-21T17:25:36.097

Suppose we have an n-vector $y$ and an $n$ by $p$ matrix $X$. The subspace $S$ spanned by the $p$ columns of $X$ is the set of vectors formed by taking all possible linear combinations of the columns of $X$ (an infinite number). For example, if $X$ consists of two nonzero columns that do not lie on top of each other then $S$ will be a plane through the origin.

The projection of $y$ on $S$ is the point $\hat{y}$ in $S$ that is closest to $y$. See the diagram in Why is $\mathbf{y}-\mathbf{\hat{y}}$ perpendicular to the subspace spanned by $\mathbf{x}$ in linear regression? where our $S$ is the yellow area in that diagram.

The projection has the property that $\hat{y}$ and $y-\hat{y}$ are orthogonal. This must be so because if we take any other point $p$ in $S$ then the triangle formed by the tips of $y$, $\hat{y}$ and $p$ is a right angle triangle in which the segment from $y$ to $p$ is the hypotenuse and since the hypotenuse is the longest side $p$ cannot be closer to $y$ than $\hat{y}$.

Another property to note is that projection of $\hat{y}$ on $S$ is just $\hat{y}$ again since $\hat{y}$ already lies in $S$.

The regression of $y$ on $X$ is just the projection of $y$ on $S$ and the regression coefficients, the vector $\hat{b}$, is the vector that $X$ maps to $\hat{y}$, i.e. $\hat{y} = X\hat{b}$. (It will be unique if $X$ is of full rank, i.e. if there is no nonzero $b$ such that $Xb = 0$.) $\hat{y}$ is referred to as the fitted values and $e=y-\hat{y}$ is referred to as the residuals. From the above $y = \hat{y} + e$ and the terms on the right hand side, i.e. the fitted values $\hat{y}$ and the residuals $e$, are orthogonal to each other. (It is also true from the Pythagorean theorem that $||y||^2 = ||\hat{y}||^2 + ||e||^2$ because the points $0$, $y$ and $\hat{y}$ form a right angle triangle where the side from $0$ to the tip of $y$ is the hypotenuse.)

We can demonstrate the orthogonality modulo computer floating point precision of $e$ to $X$ and to $\hat{y}$ in R using the built in BOD data frame like this:

fm <- lm(demand ~ Time, BOD)
X <- model.matrix(fm)
yhat <- fitted(fm)
e <- resid(fm)

crossprod(X, e)
##                      [,1]
## (Intercept) -8.881784e-16
## Time         0.000000e+00

crossprod(yhat, e)
##               [,1]
## [1,] -1.776357e-15

To construct the projection matrix from above we multiply the first equation below by $X'$ giving the second but $X'e$ is zero since $e$ is orthogonal to $S$ and hence to the columns of $X$ giving the third equation.

$y = X\hat{b} + e$

$X'y = X'X\hat{b} + X'e$

$X'y = X'X\hat{b}$

Now in the usual case where the columns of $X$ are linearly independent $X'X$ is invertible so multiply through by $(X'X)^{-1}$ giving $\hat{b} = (X'X)^{-1} X'y$ and since $\hat{y} = X\hat{b}$ we have $\hat{y} = X(X'X)^{-1} X'y$ so as the projection is a matrix, it represents a linear transformation.

Would you elaborate why taking projection on subspace is linear? — offset-null1, Sep 21 '20 at 00:50
I think its easiest to just construct it and since it is a matrix it represents a linear transformation. Added that at the end. — G. Grothendieck, Sep 21 '20 at 02:13

Intuition behind projection matrix

1 Answers1

Related