Why is matrix notation in linear regression best way to compute coefficients?

Question

I would to find the parameters for $y_i=\beta_0+\beta_1x_{1i}+\beta_2x_{2i}$ using the least squares concept and want to prove that matrix notation is computationally more convenient than my way below :

I want to minimize : $LS(\beta)=\sum(y_i-(\beta_0+\beta_1x_{1i}+\beta_2x_{2i}))^2$

1.derivate $LS(\beta)$ w.r. to $\beta_0$ and set it to $0$

$\beta_0=\bar{y}-\beta_1\bar{x_1}-\beta_2\bar{x_2}$ (use $\beta_0$ for equations below)

2.derivate $LS(\beta)$ w.r. to $\beta_1$ and set it to $0$

$\beta_1=\frac{\sum(y_i-\bar{y_i})(\bar{x_1}-x_{1i})\space+\space\beta_2\sum(\bar{x_2}-x_{2i})(\bar{x_1}-x_{1i})}{\sum(\bar{x_1}-x_{1i})^2}$

3.derivate $LS(\beta)$ w.r. to $\beta_2$ and set it to $0$

$\beta_2=\frac{\sum(y_i-\bar{y_i})(\bar{x_2}-x_{2i})\space+\space\beta_1\sum(\bar{x_1}-x_{1i})(\bar{x_2}-x_{2i})}{\sum(\bar{x_2}-x_{2i})^2}$

I could solve for $\beta_1$ or $\beta_2$ (by plugging in) but I get a very messy expression.

Q) We also know that a matrix solution would be $(X'X)^{-1} X'Y$ which yields $O(n)=O(k^2*(n + k))$ where X = $n$ by $k$ matrix.

Why not solve for all possible $\beta$ in this scalar fashion ? (as I've written above we can solve them independent of each other!). The above complexity would be $O(n)=n^2$ so why would I choose a matrix notation ? or where am I being unreasonable ?

Because actually we never compute the inverse. Check the threads on [Least Squares Regression Step-By-Step Linear Algebra Computation](https://stats.stackexchange.com/questions/154485/) and on [Understanding QR Decomposition](https://stats.stackexchange.com/questions/160007). — usεr11852, Oct 22 '17 at 19:39
@usεr11852 so I guess the reason why we shouldn't solve all β in a "scalar" fashion is because we don't compute the inverse ? — Oleg, Oct 22 '17 at 19:56
I am not sure why you are distinguishing between "matrix" and "scalar". Matrix "notation" has nothing to do with how algorithms are implemented, it has to do with how we write mathematical expressions in papers etc. and when doing proofs. We will choose the same efficient algorithms for solving mathematical expressions regardless of whether those expressions are written using matrix notation or some other way, because "efficient algorithm" is its own thing, related to computer hardware architecture, not a consequence of mathematical notation. — jbowman, Oct 22 '17 at 20:03
@usεr11852 Now that I look it up very quickly, QR decomp. seems to be $O(n^3)$ for square matrix of size $n$, and since computing inverse (and multiplication) is $O(n^{2.37})$ it seems that naive computation of $(X^TX)^{-1}X^TY$ is fast — Łukasz Grad, Oct 22 '17 at 20:08
@jbowman I'm not sure you understand what I ask, writing equations in a smart way lead to less overhead from the computer. (1+2+3+4)/4 takes less operations than 1/4 + 2/4 + 3/4 + 4/4 is this small average example enough to relate to my question ? ( or use Gaus formula to sum 1 through 4 then divide it by 4) — Oleg, Oct 22 '17 at 20:14
No. When was the last time you saw the "matrix solution" for $\hat{\beta}$ written out using the QR decomposition as actually implemented on a computer with nested for-loops? — jbowman, Oct 22 '17 at 20:17
@ŁukaszGrad are you sure that if I had $\beta_0+\beta_1+\beta_2$ I wouldn't be able to solve as I did above ? (separating each of them) — Oleg, Oct 22 '17 at 20:25
matrix algebra notation is convenient for textbooks, it is not necessarily efficient for real computations. In particular, linear regression is faster to compute via solving of linear equations system and, in some instances, via sweep operations. — ttnphns, Oct 22 '17 at 21:42

Why is matrix notation in linear regression best way to compute coefficients?

0 Answers0