Suppose we have an n-vector $y$ and an $n$ by $p$ matrix $X$. The subspace $S$ spanned by the $p$ columns of $X$ is the set of vectors formed by taking all possible linear combinations of the columns of $X$ (an infinite number). For example, if $X$ consists of two nonzero columns that do not lie on top of each other then $S$ will be a plane through the origin.
The projection of $y$ on $S$ is the point $\hat{y}$ in $S$ that is closest to $y$. See the diagram in Why is $\mathbf{y}-\mathbf{\hat{y}}$ perpendicular to the subspace spanned by $\mathbf{x}$ in linear regression? where our $S$ is the yellow area in that diagram.
The projection has the property that $\hat{y}$ and $y-\hat{y}$ are orthogonal. This must be so because if we take any other point $p$ in $S$ then the triangle formed by the tips of $y$, $\hat{y}$ and $p$ is a right angle triangle in which the segment from $y$ to $p$ is the hypotenuse and since the hypotenuse is the longest side $p$ cannot be closer to $y$ than $\hat{y}$.
Another property to note is that projection of $\hat{y}$ on $S$ is just $\hat{y}$ again since $\hat{y}$ already lies in $S$.
The regression of $y$ on $X$ is just the projection of $y$ on $S$ and the regression coefficients, the vector $\hat{b}$, is the vector that $X$ maps to $\hat{y}$, i.e. $\hat{y} = X\hat{b}$. (It will be unique if $X$ is of full rank, i.e. if there is no nonzero $b$ such that $Xb = 0$.) $\hat{y}$ is referred to as the fitted values and $e=y-\hat{y}$ is referred to as the residuals. From the above $y = \hat{y} + e$ and the terms on the right hand side, i.e. the fitted values $\hat{y}$ and the residuals $e$, are orthogonal to each other. (It is also true from the Pythagorean theorem that $||y||^2 = ||\hat{y}||^2 + ||e||^2$ because the points $0$, $y$ and $\hat{y}$ form a right angle triangle where the side from $0$ to the tip of $y$ is the hypotenuse.)
We can demonstrate the orthogonality modulo computer floating point precision of $e$ to $X$ and to $\hat{y}$ in R using the built in BOD data frame like this:
fm <- lm(demand ~ Time, BOD)
X <- model.matrix(fm)
yhat <- fitted(fm)
e <- resid(fm)
crossprod(X, e)
## [,1]
## (Intercept) -8.881784e-16
## Time 0.000000e+00
crossprod(yhat, e)
## [,1]
## [1,] -1.776357e-15
To construct the projection matrix from above we multiply the first equation below by $X'$ giving the second but $X'e$ is zero since $e$ is orthogonal to $S$ and hence to the columns of $X$ giving the third equation.
$y = X\hat{b} + e$
$X'y = X'X\hat{b} + X'e$
$X'y = X'X\hat{b}$
Now in the usual case where the columns of $X$ are linearly independent $X'X$ is invertible so multiply through by $(X'X)^{-1}$ giving $\hat{b} = (X'X)^{-1} X'y$ and since $\hat{y} = X\hat{b}$ we have $\hat{y} = X(X'X)^{-1} X'y$ so as the projection is a matrix, it represents a linear transformation.