Relationship between Linear Projection and OLS Regression

Question

In Wooldridge's Econometric Analysis of Cross Section and Panel Data, he defines linear projection of $y$ on $1,\mathbf{x}$, in the following way:

Let's assume that $Var(\mathbf{x})$ is positive-definite, then the linear projection of $y$ on $\mathbf{x}$ exists and is unique such that

$L(y|1,\mathbf{x})=\beta_0+\mathbf{x}\beta$, where by definition

$\beta=(Var(\mathbf{x}))^{-1}Cov(\mathbf{x},y)$ and $\beta_0=E(y)-E(\mathbf{x})\beta$.

First I do not see how the author reaches that definition, and how it relates to this one in Wikipedia's article on projection. We usually have $P=X(X'X)^{-1}X'$, where $X$ is the matrix with columns $1,\text{and }\mathbf{x}$, for the sample (not sure how to write this projection w.r.t. the population though). Second, instead of defining the betas as above, couldn't we have derived those equations? How?

Any help would be appreciated.

Edit: According to the book, the notation is the following. $\mathbf{x}$ is a row vector of dimension $K$, so the dimension of the design matrix is $N\times K$

AdamO · Accepted Answer · 2016-05-16T18:33:14.347

3

Another formulation of the $\beta$ regression parameter estimator is as

$\hat{\beta} = \left(\mathbf{X}^T\mathbf{X}\right)^{-1} \mathbf{X}^T Y$

Here $\hat{\beta}$ is a two element vector of $\hat{\beta}_0$ the intercept and $\hat{\beta}_1$ the slope. I like to use $\mathbf{X}$ notationally as a design matrix with the principal column a vector of 1s.

It's easy to see the crossproducts have a factor of $n$ that cancels out in these operations. WLOG we may assume that the random component(s) of $\mathbf{X}$ are centered, you have:

$\mbox{Cov} (\mathbf{X}) = \frac{1}{n}\left( \mathbf{X}^T \mathbf{X} \right) $

and

$\mbox{Cov} (\mathbf{X}, Y) = \frac{1}{n}\left( \mathbf{X}^T Y \right) $

Therefore the least squares estimator can be expressed as $\beta_1 = E(\hat{\beta}_1) = \mbox{Cov}(X, Y) / \mbox{Var} (X)$ for univariate models.

The projection matrix is formulated as $ P = \mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^{-1} \mathbf{X}^T$ and the predicted values of $Y$ (e.g. $\hat{Y}$) are given by:

$\hat{Y} = PY = \mathbf{X}\left(\mathbf{X}^T\mathbf{X}\right)^{-1} \mathbf{X}^TY = \mathbf{X} \hat{\beta}$

and you'll see it's a projection. All predicted values of $Y$ are formed using a linear combination of vectors of $\mathbf{X}$ e.g. they span the basis of $\mathbf{X}$. The projection specifically "projects" the values of $Y$ onto the fitted values $\hat{Y}$.

The author has formulated the fitted value or conditional mean function using unusual notation $L(y | 1, x)$ is basically equivalent to $\hat{Y}$

Alternately, the hat matrix, or influence matrix, is $H = \mathcal{I} - P$ and the residuals are given by $r = HY$.

Reference: Seber, Lee 2nd edition 1990.

edited May 16 '16 at 18:33

answered May 11 '16 at 23:41

AdamO

52,330
5
104
209

Thanks for your answer. I'm not sure about something on your answer. How would one write that but in terms of population moments? If the author was speaking of the estimate of betas, then I think he would have written so... – An old man in the sea. May 12 '16 at 07:46
Also, according to the notation used $\mathbf{x}$ is not the design matrix, but just the row vector $(x_1,...,x_K)$ – An old man in the sea. May 12 '16 at 13:28
@Anoldmaninthesea. can you clarify exactly what you're asking? I don't understand your first comment. On the second, I think it's relatively straightforward to see through any notational differences. You will need matrix notation for the design matrix to connect the estimation and definition of LS betas to projections as your question stated. – AdamO May 12 '16 at 16:16
Adam, I'll try. In my 1st question, I'm asking why your using an estimate of beta. Usually, in the question's notation, $\beta= E(\mathbf{x}^T\mathbf{x})^{-1}E(\mathbf{x}^Ty)$. In this equation we use the population moments, and each row of the design matrix are draws from the population. – An old man in the sea. May 13 '16 at 17:57
@Anoldmaninthesea. You've answered your own question, the estimator defines the parameter: $\beta_1 = \sigma_{xy} / \sigma_{x}^2 = E(\mbox{cov} (X, Y) / \mbox{var}(X))$. Usually we think of the design matrix $X$ as "fixed", though. We think of $X$ as coming from its empirical distribution so that $\mbox{var}{X} = \sum_i^n(X_i - \bar{X})^2 / {n} = n^{-1} \vec{X}^T\vec{X}$ for $X$ centered. Similarly, $\sigma_{x, y} = E(X^TY)$. The formulation you present is more probabilistic than I'd like. Statistically, it is more instructive to see that for a finite realization these properties hold. – AdamO May 13 '16 at 19:28
Adam,I did a mistake in previous comment. That equation for $\beta$ is true when $\beta_0 =0$. Also, the way it's usually reached is by assuming that $y=\mathbf{x}\beta+u$, $E(\mathbf{x}'u)=0$ and $E(\mathbf{x}'\mathbf{x})$ is full rank. I fail to see how the author obtains the expression with covariance-variance matrix and the covariance vector in the random sample setting... – An old man in the sea. May 13 '16 at 21:09
@anoldmaninthesea. I have mentioned a few times now the need to center for $X$ variable. In analyses, I frequently do this because it does not affect results and it improves interpretation of the intercept parameter. Also we do not boldface $X$ unless you are speaking of the design matrix. I highly recommend you obtain a copy of Seber and Lee, it is a standard text on the subject of linear modeling. – AdamO May 15 '16 at 13:29
I'm getting downvoted on my answer and failing to see how I've not answered the original question – AdamO May 15 '16 at 13:29
Adam, It's not me who downvoted... – An old man in the sea. May 15 '16 at 14:00
This is exactly right. Wooldridge just loves unnecessary novel notation that doesn't actually help anyone understand anything. – Robert E Mealey May 18 '16 at 16:57

score 1 · Answer 2 · answered May 19 '20 at 18:12

We start with the population model: $$\begin{aligned}y &=E(y|{\bf x)}+e \\ E(e|{\bf x}) &=0 \end{aligned}$$ where $m({\bf x})=E(y|x)$ is the conditional expectation function (CEF) and e is the CEF error. It is important to note that the equations above are a definition. Here $E(e|{\bf x})=0$ is a defintion, not a restriction.

It can be shown that $m({\bf x})$ is the Best Predictor of $y$ in the sense that it minimizes the mean squared prediction error $E\left[(y-g({\bf x}))^2\right]$. If we could obtain $m({\bf x})$ then we would have the best overall predictor of $y$.

However, we do not know the functional form of $m({\bf x})$. We could assume it is linear, but this might be overly-restrictive. Why would it be linear?

Instead we consider the Best Linear Predictor of $y$. A linear predictor of $y$ is function of the form ${\bf x}'\boldsymbol\beta$. The best linear predictor minimizes $$E\left[ (y- {\bf x}'\boldsymbol\beta)^2\right] \tag{1}$$ If the variance matrix of $\bf x$, $E({\bf xx}')$, is positive definite (i.e, invertible, i.e., nonsingular) then we can find a unique solution for $\beta$ in (1) by taking and solving the first order condition. We get $$\boldsymbol\beta=\left(E[{\bf xx}'] \right)^{-1}E[{\bf x}y] \tag{2}$$

By pluggin (2) into ${\bf x}'\boldsymbol\beta$ we get the best linear predictor also called the Linear Projection of $y$ on $\bf x$

$$\begin{aligned}L(y|{\bf x})&={\bf x}'\boldsymbol\beta \\ \text{where} \ \ \ \ \boldsymbol\beta &=\left(E[{\bf xx}'] \right)^{-1}E[{\bf x}y]\end{aligned}$$

The Projection Error is then: $$e=y-{\bf x}'\boldsymbol\beta \tag{3}$$
and we can see that $E({\bf x}e)=0$
$$\begin{aligned} E[{\bf x}e] &= E[{\bf x}(y-{\bf x}'\boldsymbol\beta)] \\ &= E[{\bf x}y]-E[{\bf xx}'] \left(E[{\bf xx}']\right)^{-1}E[{\bf x}y] \\ &={\bf 0} \end{aligned}$$ Since $\bf x$ has a constant we get $E(e)=0$ as well.
Now lets take (3) rearrange it to have y on the left side. Then separate the constant out of $\bf x$ and take expectations on both sides $$E[y] =E[{\bf x}'\boldsymbol\beta]+E[\beta_0]+E[e] $$
Note that $E[\beta_0]=\beta_0$ and $E[e]=0$. Then solve for $\beta_0$ to get same as Wooldridge. $$\beta_0=E[y]-E[{\bf x}]'\boldsymbol\beta \tag{4}$$ Now subtract (4) from (3) to get,
$$y-E[y]=({\bf x}-E[{\bf x}])'\boldsymbol\beta + e \tag{5}$$

Because $({\bf x}-E[{\bf x}])$ and $e$ are uncorrelated, (5) is also a linear projection and we can find $\boldsymbol\beta$ $$\begin{aligned}\boldsymbol\beta & =\left(E\left[ ({\bf x}-E[{\bf x}])({\bf x}-E[{\bf x}])'\right]\right)^{-1} E\left[ (y-E[y])(y-E[y])'\right] \\ &= [\text{Var}({\bf x})]^{-1} \text{Cov}({\bf x}y)\end{aligned}$$.

Thus we have our Linear Projection Model $$\begin{aligned} & y=\beta_0 +{\bf x}'\boldsymbol\beta +e \\ \text{where} & \\ & L(y|1,{\bf x})= \beta_0 + {\bf x}'\boldsymbol\beta \\ & \boldsymbol\beta=[\text{Var}({\bf x})]^{-1} \text{Cov}({\bf x}y) \\ & \beta_0=E[y]-E[{\bf x}]'\boldsymbol\beta \end{aligned}$$

Relationship between Linear Projection and OLS Regression

2 Answers2

Linked