OLS derivation question

Asked Nov 20 '14 at 02:28

Active Nov 20 '14 at 02:38

Viewed 47 times

How come I always see the derivation of $\hat{\beta}$ in OLS using matrix differentiation and solving for when the derivative is $0$.

Couldn't one just derive it also by noting that in $Y = X\beta + \epsilon$, the best estimate of $y$ in the column space of $X$ w.r.t. $L_2$ norm is when $Y-X\beta$ is orthogonal to $X\beta$ and then find $\beta$ such that $(Y-X\beta)'X\beta = 0$?

Or is there an assumption I'm missing here...

edited Nov 20 '14 at 02:38

asked Nov 20 '14 at 02:28

Glassjawed

**[Related](http://stats.stackexchange.com/a/9824/2970)**. – cardinal Nov 20 '14 at 02:55
This two approaches are equivalent to each other. There is a theorem called "projection theorem" to explain it. Roughly, it states that the necessary and sufficient condition to get the least squared norm solution (first approach) is to keep the residual be orthogonal to the space spanned by X (your approach, but you need slightly to change the formula you gave here for it doesn't give you the correct LSE). – Zhanxiong Nov 20 '14 at 02:59
Sure, but your typical undergrad is better with calculus than linear algebra. Speaking of which, I first learned this in my linear algebra class and not a stats class – shadowtalker Nov 20 '14 at 05:52

OLS derivation question

0 Answers0