0

How come I always see the derivation of $\hat{\beta}$ in OLS using matrix differentiation and solving for when the derivative is $0$.

Couldn't one just derive it also by noting that in $Y = X\beta + \epsilon$, the best estimate of $y$ in the column space of $X$ w.r.t. $L_2$ norm is when $Y-X\beta$ is orthogonal to $X\beta$ and then find $\beta$ such that $(Y-X\beta)'X\beta = 0$?

Or is there an assumption I'm missing here...

Glassjawed
  • 457
  • 3
  • 13
  • **[Related](http://stats.stackexchange.com/a/9824/2970)**. – cardinal Nov 20 '14 at 02:55
  • This two approaches are equivalent to each other. There is a theorem called "projection theorem" to explain it. Roughly, it states that the necessary and sufficient condition to get the least squared norm solution (first approach) is to keep the residual be orthogonal to the space spanned by X (your approach, but you need slightly to change the formula you gave here for it doesn't give you the correct LSE). – Zhanxiong Nov 20 '14 at 02:59
  • Sure, but your typical undergrad is better with calculus than linear algebra. Speaking of which, I first learned this in my linear algebra class and not a stats class – shadowtalker Nov 20 '14 at 05:52

0 Answers0