How come I always see the derivation of $\hat{\beta}$ in OLS using matrix differentiation and solving for when the derivative is $0$.
Couldn't one just derive it also by noting that in $Y = X\beta + \epsilon$, the best estimate of $y$ in the column space of $X$ w.r.t. $L_2$ norm is when $Y-X\beta$ is orthogonal to $X\beta$ and then find $\beta$ such that $(Y-X\beta)'X\beta = 0$?
Or is there an assumption I'm missing here...