Converting the beta coefficient from matrix to scalar notation in OLS regression

Question

I've found for my econometrics exams that if I forget the scalar notation, I can often save myself by remembering the matrix notation and working backwards. However, the following confused me.

Given the simple estimation

$$\hat{y_i} = \hat{\beta_0} + \hat{\beta_1}x_{i1}$$

How do we get from

$$\boldsymbol{\hat{\beta}} = \boldsymbol{(X'X)}^{-1}\boldsymbol{X'y} $$

to

$$\hat{\beta}_1 = \frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n(x_i-\bar{x})^2} $$

I get stuck at

$$\hat{\beta}_1 = \frac{\sum_{i=1}^nx_iy_i}{\sum_{i=1}^nx_i^2}$$

What are you using for $\boldsymbol X$? Have you included a column of ones for the intercept term? — whuber, Jun 19 '18 at 14:04
Yep, supposing the model is simply y_i = beta_0 + \beta_1x_{i1} — JuniorBurger, Jun 19 '18 at 14:09
My point being, there is no inclusion of the centering matrix M_0 in the matrix form, so how does one derive the \bar{x} and \bar{y} ? — JuniorBurger, Jun 19 '18 at 14:13
If you have included that column of ones, then you haven't computed the inverse of $\boldsymbol {X^\prime X}$ correctly: it must be a $2\times 2$ matrix and applying it to $\boldsymbol{X^\prime y}$ therefore will give you a $2$-vector. — whuber, Jun 19 '18 at 14:13
Sorry I think I may not of stated the question clearly. My main question is where do the sample means of x and y come from? How do you arrive at the formula for the scalar beta_1, Cov (x, y) over Var (x), starting from the matrix notation? — JuniorBurger, Jun 19 '18 at 14:49
For the sake of an easy formula to remember for your exams you might better use $$\beta = Cov(x,y)/Var(x) $$ — Sextus Empiricus, Jun 20 '18 at 14:02

score 4 · Answer 1 · answered Jun 20 '18 at 12:51

Solution

The matrix algebra can be dismaying and, if not carried out elegantly, can require an awful lot of (superfluous) algebraic manipulation. However, the situation is much simpler than it looks, because (creating the matrix $X$ by putting a column of ones in first, and then the column of independent values $(x_i)$ after it)

$$X^\prime X = \pmatrix{n & S_x \\ S_x & S_{xx}}$$

and

$$X^\prime y = \pmatrix{S_y & S_{xy}}$$

(The $S_{*}$ are handy--and fairly common--abbreviations for sums of the variables and their products). Thus, the normal equations for the estimates $\hat\beta = (\hat\beta_0, \hat\beta_1)$ are--when written out as simultaneous linear equations--merely

$$\matrix{n \hat\beta_0 + S_x\hat\beta_1 = S_y \\ S_x \hat\beta_0 + S_{xx}\hat\beta_1 = S_{xy},}$$

which are to be solved for $\hat\beta_0$ and $\hat\beta_1.$ Indeed, you don't really need to solve this ab initio: all you have to do at this point is check which formula for $\hat \beta_1$ actually works. That requires only elementary algebra. I won't show it because there's a better way that produces the same result in a much more illuminating and generalizable fashion.

Motivation and Generalization

Recall that the normal equations are derived by considering the problem of minimizing the sum of squares of residuals,

$$\operatorname{SSR} = \sum_i \left(y_i - (\beta_0 + \beta_1 x_i)\right)^2.$$

The appearance of $\beta_0$ corresponds to a column of ones in $X$ while the appearance of $\beta_1$ corresponds to a column $(x_i)$ in $X$. In general, those columns are not orthogonal. (Recall that we say two vectors are orthogonal when their dot product is zero. Geometrically, this means they are perpendicular. See the references for more about this.) We can make them orthogonal by subtracting some multiple of one of them from the other. The easiest choice is to subtract a constant from each $x_i$ to make the result orthogonal to the constant column; that is, we seek a number $c$ for which

$$0 = (1,1,\ldots, 1) \cdot (x_1-c, x_2-c, \ldots, x_n-c) = \sum_{i} (1(x_i-c)) = Sx - nc.$$

The unique solution clearly is $c = Sx/n = \bar x,$ the mean of the $x_i.$ Accordingly, let's rewrite the model in terms of the "centered" variables $x_i-\bar x.$ It asks us to minimize

$$\operatorname{SSR} = \sum_i \left(y_i - (\beta_0 + \beta_1\bar x + \beta_1 (x_i-\bar x))\right)^2.$$

For simplicity, write the unknown constant term as

$$\alpha = \beta_0 + \beta_1 \bar x,$$

understanding that once solutions $\hat\alpha$ and $\hat\beta_1$ are obtained, we easily find the estimate

$$\hat\beta_0 = \hat\alpha - \hat\beta_1\bar x.$$

In terms of the unknowns $(\hat\alpha,\hat\beta_1)$ the Normal equations are now

$$\pmatrix{n & 0 \\ 0 & \sum_i(x_i-\bar x)^2}\pmatrix{\hat\alpha\\\hat\beta_1}=\pmatrix{Sy \\ \sum_i (x_i-\bar x)y_i}.$$

When written out as two simultaneous linear equations, each unknown is isolated in its own equation, which is simple to solve: this is what having orthogonal columns in $X$ achieves. In particular, the equation for $\hat\beta_1$ is

$$\sum_i(x_i-\bar x)^2\ \hat\beta_1 = \sum_i (x_i-\bar x)y_i.$$

It's a short and simple algebraic step from this to the desired result. (Use the fact that $\sum_i (x_i-\bar x)\bar y = 0.$)

The generalization to multiple variables proceeds in the same manner: at the first step, subtract suitable multiples of the first column of $X$ from each of the other columns so that all the resulting columns are orthogonal to the first column. (Recall this comes down to solving a linear equation for one unknown constant $c,$ which is easy.) Repeat by subtracting suitable multiples of the second column from the (new) third, fourth, ..., etc. columns to make them orthogonal to the first two columns simultaneously. Continue "sweeping out" the columns in this fashion until they are mutually orthogonal. The resulting normal equations will involve at most one variable at a time and therefore are simple to solve. Finally, the solutions have to be converted back to the original variables (just like you have to convert the estimates $\hat\alpha$ and $\hat\beta_1$ back into an estimate of $\hat\beta_0$ in the ordinary regression case). At each step of the way, all you are doing is creating new equations from old ones and solving for a single variable at a time.

References

For a more formal account of this approach to solving the normal equations, see Gram-Schmidt orthogonalization.

Its use in multiple regression is discussed by Lynne Lamotte in The Gram-Schmidt Construction as a Basis for Linear Models, The American Statistician 68(1), February 2014.

To see how to find just a single coefficient estimate without having to compute the others, see the analysis at https://stats.stackexchange.com/a/166718/919.

For a geometric interpretation, see my answers at https://stats.stackexchange.com/a/97881/919, https://stats.stackexchange.com/a/113207/919,

score 3 · Answer 2 · answered Jun 19 '18 at 15:30

3

If you regress on a constant and $x_i$, your matrix $X$ is \begin{pmatrix}1 &x_1\\ \vdots&\vdots\\1 &x_n \end{pmatrix} Hence, $$ X'X=\begin{pmatrix}n &\sum_ix_i\\ \sum_ix_i&\sum_ix_i^2\\ \end{pmatrix} $$ and $$(X'X)^{-1}=\frac{1}{n\sum_ix_i^2-(\sum_ix_i)^2} \begin{pmatrix}\sum_ix_i^2 &-\sum_ix_i\\ -\sum_ix_i&n\\ \end{pmatrix} $$ Can you take it from here?

answered Jun 19 '18 at 15:30

Christoph Hanck

25,948
3
57
106

Ahh I just watched a video on taking the inverse of a matrix...clearly my knowledge of matrix notations/operations was not up to scratch! – JuniorBurger Jun 19 '18 at 23:54
@user212080 in the case of simple linear regression you could solve the problem $X^tX\beta = X^ty$ manually without using a standard expression for the inverse of a matrix. – Sextus Empiricus Jun 20 '18 at 14:27

score 2 · Answer 3 · answered Jun 20 '18 at 01:09

For anyone else out there that might be struggling with this, I've written it all out below step by step.

Suppose for the ease of explanation we have a minimum sample of 1 $x$ variable ($k=1$) and only 2 observations ($n=2$); Our estimation in scalar is $\hat{y_i} = \hat{\beta_0}+\hat{\beta_1}x_i$

\begin{equation*} \boldsymbol{\hat{\beta}}=\begin{pmatrix} \hat{\beta_0} \\ \hat{\beta_1} \end{pmatrix} \end{equation*}

\begin{equation*} \boldsymbol{y}=\begin{pmatrix} y_i \\ y_i \end{pmatrix} \end{equation*}

\begin{equation*} \boldsymbol{X}=\begin{pmatrix} 1 & x_i \\ 1& x_i \end{pmatrix} \end{equation*}

Therefore

\begin{equation*} \boldsymbol{X'}=\begin{pmatrix} 1 & 1 \\ x_i & x_i \end{pmatrix} \end{equation*}

and;

\begin{equation*} \boldsymbol{X'X}=\begin{pmatrix} n & \sum_{i=1}^nx_i \\ \sum_{i=1}^nx_i & \sum_{i=1}^nx_i^2 \end{pmatrix} \end{equation*}

Remember the rules of \textbf{inverse matrices}, where det[.] = the determinant of the matrix, and adj[.] = the adjugate (sometimes called adjoint) of the matrix.;

\begin{equation*} \boldsymbol{(X'X)^{-1}}= \frac{1}{\textrm{det[$\boldsymbol{X'X}$]}}\times \textrm{adj[$X'X$]} \\ \end{equation*} \begin{equation*} \textrm{det[$\boldsymbol{X'X}$]}= \frac{1}{ad-bc}= \frac{1}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} \end{equation*} \begin{equation*} \textrm{adj[$\boldsymbol{X'X}$]} = \begin{pmatrix} d & -b \\ -c & a \end{pmatrix} =\begin{pmatrix} \sum_{i=1}^nx_i^2 & -\sum_{i=1}^nx_i \\ -\sum_{i=1}^nx_i & n \end{pmatrix} \end{equation*}

Therefore

\begin{equation} \boldsymbol{(X'X)^{-1}}=\frac{1}{\textrm{det[$\boldsymbol{X'X}$]}}\times \textrm{adj[$X'X$]} = \begin{pmatrix} \frac{\sum_{i=1}^nx_i^2}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} & \frac{-\sum_{i=1}^nx_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} \\ \frac{-\sum_{i=1}^nx_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} & \frac{n}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} \end{pmatrix} \end{equation}

\begin{equation*} \boldsymbol{X'y}=\begin{pmatrix} 1 & 1 \\ x_i & x_i \end{pmatrix} \times \begin{pmatrix} y_i \\ y_i \end{pmatrix} = \begin{pmatrix} \sum_{i=1}^ny_i \\ \sum_{i=1}^nx_iy_i \end{pmatrix} \end{equation*}

Therefore

\begin{align*} \boldsymbol{\hat{\beta}} & =\boldsymbol{(X'X)^{-1}}\boldsymbol{X'y}\\ \begin{pmatrix} \hat{\beta_0} \\ \hat{\beta_1} \end{pmatrix} & = \begin{pmatrix} \frac{\sum_{i=1}^nx_i^2}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} & \frac{-\sum_{i=1}^nx_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} \\ \frac{-\sum_{i=1}^nx_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} & \frac{n}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} \end{pmatrix} \times \begin{pmatrix} \sum_{i=1}^ny_i \\ \sum_{i=1}^nx_iy_i \end{pmatrix} \end{align*}

\begin{align*} \hat{\beta_1} & =\frac{-\sum_{i=1}^nx_i \times \sum_{i=1}^ny_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} + \frac{n \times \sum_{i=1}^nx_iy_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} \\ % \hat{\beta_1} & =\frac{n\sum_{i=1}^nx_iy_i - \sum_{i=1}^nx_i\sum_{i=1}^ny_i}{n\sum_{i=1}^nx_i^2-(\sum_{i=1}^nx_i)^2} \\ \end{align*} Remembering $\frac{1}{n}\sum_{i=1}^nx_i = \bar{x}$, therefore $\sum_{i=1}^nx_i = n\bar{x}$ (likewise for $y_i$); % \begin{align*} \hat{\beta_1} & =\frac{n\sum_{i=1}^nx_iy_i - n\bar{x}n\bar{y}}{n\sum_{i=1}^nx_i^2-(n\bar{x})^2} \\ \hat{\beta_1} & =\frac{n\sum_{i=1}^nx_iy_i - n^2\bar{x}\bar{y}}{n\sum_{i=1}^nx_i^2-n^2(\bar{x})^2} \\ \textrm{Dividing by $n$;} \\ \hat{\beta_1} & =\frac{\sum_{i=1}^nx_iy_i - n\bar{x}\bar{y}}{\sum_{i=1}^nx_i^2-n(\bar{x})^2} \\ \end{align*} \begin{equation} \hat{\beta_1} = \frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n(x_i-\bar{x})^2} \end{equation}

Converting the beta coefficient from matrix to scalar notation in OLS regression

3 Answers3

Solution

Motivation and Generalization

References

Linked

Related