I am learning about the linear regression. I was taught it from two perspectives.
The first one was about an equation connecting the conditional expected value and the predictor. I saw the nice graphs with conditional distributions spanned along the regression line, explaining parts of the equation (what is the true value, the approximated value, the residual), etc.
The second approach was about projection of Y onto the subspace of X. Again, they show us the true Y, the approximated, projected Y, the residual vector. They told us how to calculate the projection via the OLS.
I saw the formula for the projection and it was exactly the vector form of the linear regression with the beta coefficients, w0 + wx. I was told how the hat matrix is the projection matrix, so the "hat" means it turns the Y into approximations (y hat).
So I almost see the same terms in both approaches. The beta coefficients (slopes), the intercept, the equation of projection. I understand the regression line is both a projection on a plane and averaged "trajectory" between the observations. The conditional distributions lie on the line - or the line goes through the conditional mans. It all somehow connects, but with gaps.
I fail to see how the conditional expected value is connected to the projection via OLS. I saw wise materials with tons of formulas, but it was a script from a lecture, with exercises without many explanations.
Could anyone show me, on some basic level, how the projection turns into conditional expectation? Where is the transition getting me from conditional expected values to the projection onto a plane?