8

Assume a general linear model $y = X \beta + \epsilon$ with observations in an $n$-vector $y$, a $(n \times p)$-design matrix $X$ of rank $p$ for $p$ parameters in a $p$-vector $\beta$. A general linear hypothesis (GLH) about $q$ of these parameters ($q < p$) can be written as $\psi = C \beta$, where $C$ is a $(q \times p)$ matrix. An example for a GLH is the one-way ANOVA hypothesis where $C \beta = 0$ under the null.

The GLH-test uses a restricted model with design matrix $X_{r}$ where the $q$ parameters are set to 0, and the corresponding $q$ columns of $X$ are removed. The unrestricted model with design matrix $X_{u}$ makes no restrictions, and thus contains $q$ free parameters more - its parameters are a superset of those from the restricted model, and the columns of $X_{u}$ are a superset of those from $X_{r}$.

$P_{u} = X_{u}'(X_{u}'X_{u})^{-1} X'$ is the orthogonal projection onto subspace $V_{u}$ spanned by $X_{u}$, and analogously $P_{r}$ onto $V_{r}$. Then $V_{r} \subset V_{u}$. The parameter estimates of a model are $\hat{\beta} = X^{+} y = (X'X)^{-1} X' y$, the predictions are $\hat{y} = P y$, the residuals are $(I-P)y$, the sum of squared residuals SSE is $||e||^{2} = e'e = y'(I-P)y$, and the estimate for $\psi$ is $\hat{\psi} = C \hat{\beta}$. The difference $SSE_{r} - SSE_{u}$ is $y'(P_{u}-P_{r})y$. Now the univariate $F$ test statistic for a GLH that is familiar (and understandable) to me is: $$ F = \frac{(SSE_{r} - SSE_{u}) / q}{\hat{\sigma}^{2}} = \frac{y' (P_{u} - P_{r}) y / q}{y^{t} (I - P_{u}) y / (n - p)} $$

There's an equivalent form that I don't yet understand: $$ F = \frac{(C \hat{\beta})' (C(X'X)^{-1}C')^{-1} (C \hat{\beta}) / q}{\hat{\sigma}^{2}} $$

As a start $$ \begin{array}{rcl} (C \hat{\beta})' (C(X'X)^{-1}C')^{-1} (C \hat{\beta}) &=& (C (X'X)^{-1} X' y)' (C(X'X)^{-1}C')^{-1} (C (X'X)^{-1} X' y) \\ ~ &=& y' X (X'X)^{-1} C' (C(X'X)^{-1}C')^{-1} C (X'X)^{-1} X' y \end{array} $$

  • How do I see that $P_{u} - P_{r} = X (X'X)^{-1} C' (C(X'X)^{-1}C')^{-1} C (X'X)^{-1} X'$?
  • What is the explanation for / motivation behind the numerator of the 2nd test statistic? - I can see that $C(X'X)^{-1}C'$ is $V(C \hat{\beta}) / \sigma^{2} = (\sigma^{2} C(X'X)^{-1}C') / \sigma^{2}$, but I can't put these pieces together.
caracal
  • 11,549
  • 49
  • 63

2 Answers2

7

For your second question, you have $\mathbf{y}\sim N(\mathbf{X}\boldsymbol{\beta},\sigma^2 \mathbf{I})$ and suppose you're testing $\mathbf{C}\boldsymbol{\beta}=\mathbf{0}$. So, we have that (the following is all shown through matrix algebra and properties of the normal distribution -- I'm happy to walk through any of these details)

$ \mathbf{C}\hat{\boldsymbol{\beta}}\sim N(\mathbf{0}, \sigma^2 \mathbf{C(X'X)^{-1}C'}). $

And so,

$ \textrm{Cov}(\mathbf{C}\hat{\boldsymbol{\beta}})=\sigma^2 \mathbf{C(X'X)^{-1}C}. $

which leads to noting that

$ F_1 = \frac{(\mathbf{C}\hat{\boldsymbol{\beta}})'[\mathbf{C(X'X)^{-1}C'}]^{-1}\mathbf{C}\hat{\boldsymbol{\beta}}}{\sigma^2}\sim \chi^2 \left(q\right). $

You get the above result because $F_1$ is a quadratic form and by invoking a certain theorem. This theorem states that if $\mathbf{x}\sim N(\boldsymbol{\mu}, \boldsymbol{\Sigma})$, then $\mathbf{x'Ax}\sim \chi^2 (r,p)$, where $r=\textrm{rank}(A)$ and $p=\frac{1}{2}\boldsymbol{\mu}'\mathbf{A}\boldsymbol{\mu}$, iff $\mathbf{A}\boldsymbol{\Sigma}$ is idempotent. [The proof of this theorem is a bit long and tedious, but it's doable. Hint: use the moment generating function of $\mathbf{x'Ax}$].

So, since $\mathbf{C}\hat{\boldsymbol{\beta}}$ is normally distributed, and the numerator of $F_1$ is a quadratic form involving $\mathbf{C}\hat{\boldsymbol{\beta}}$, we can use the above theorem (after proving the idempotent part).

Then,

$ F_2 = \frac{\mathbf{y}'[\mathbf{I} - \mathbf{X(X'X)^{-1}X'}]\mathbf{y}}{\sigma^2}\sim \chi^2(n-p-1) $

Through some tedious details, you can show that $F_1$ and $F_2$ are independent. And from there you should be able to justify your second $F$ statistic.

  • Thanks for your fast reply! Could you please explain the "which leads to noting that" step to $F_{1}$ a little bit further? That's the one I'm not getting... – caracal Oct 18 '11 at 17:26
  • 1
    @caracal Sure. I edited my response to add in some details. –  Oct 18 '11 at 17:41
  • I'll accept this answer, but I'd still be very happy about an answer to my first question as well - literature tips are of course welcome! – caracal Oct 20 '11 at 12:57
  • Related: https://stats.stackexchange.com/q/188626/119261. – StubbornAtom Jun 25 '20 at 19:07
5

Since nobody has done so yet, I will address your first question. I also could not find a reference for [a proof of] this result anywhere, so if anyone knows a reference please let us know.

The most general test that this $F$-test can handle is $H_0 : C \beta = \psi$ for some $q \times p$ matrix $C$ and $q$-vector $\psi$. This allows you to test hypotheses like $H_0 : \beta_1 + \beta_2 = \beta_3 + 4$.

However, it seems you are focusing on testing hypotheses like $H_0 : \beta_2 = \beta_4 = \beta_5 = 0$, which is a special case with $\psi=0$ and $C$ being a matrix with one $1$ in each row, and all other entries being $0$. This allows you to more concretely view the smaller model as obtained by simply dropping some columns of your design matrix (i.e. going from $X_u$ to $X_r$), but in the end the result you are seeking is in terms of an abstract $C$ anyway.

Since it happens to be true that the formula $(C\hat{\beta})' (C (X'X)^{-1} C') (C \hat{\beta})$ works for arbitrary $C$ and $\psi=0$, I will prove it in that level of generality. Then you can consider your situation as a special case, as described in the previous paragraph.

If $\psi \ne 0$, the formula needs to be modified to $(C\hat{\beta} - \psi)' (C (X'X)^{-1} C') (C \hat{\beta} - \psi)$, which I also prove at the end of this post.


First I consider the case $\psi=0$. I will try to keep some of your notation. Let $V_u = \text{colspace}(X) = \{X\beta : \beta \in \mathbb{R}^p\}$. Let $V_r := \{X\beta : C\beta=0\}$. (This would be the column space of your $X_r$ in your special case.)

Let $P_u$ and $P_r$ be the projections on these two subspaces. As you noted, $P_u y$ and $P_r y$ are the predictions under the full model and the null model respectively. Moreover, you can show $\|(P_u - P_r) y\|^2$ is the difference in the sum of squares of residuals.

Let $V_l$ be the orthogonal complement of $V_r$ when viewed as a subspace of $V_u$. (In your special case, $V_l$ would be the span of the columns of the removed columns of $X_u$.) Then $V_r \oplus V_l = V_u$, and moreover, In particular, if $P_l$ is the projection onto $V_l$, then $P_u = P_r + P_l$.

Thus, the difference in the sum of squares of residuals is $$\|P_l y\|^2.$$ If $\tilde{X}$ is a matrix whose columns span $V_l$, then $P_l = \tilde{X} (\tilde{X}'\tilde{X})^{-1} \tilde{X}'$ and thus $$\|P_l y\|^2 = y'\tilde{X} (\tilde{X}'\tilde{X})^{-1} \tilde{X}' y.$$

In view of your attempt at the bottom of your post, all we have to do is show that choosing $\tilde{X} := X(X'X)^{-1} C'$ works, i.e., that $V_l$ is the span of this matrix's columns. Then that will conclude the proof.

  • It is clear that $\text{colspace}(\tilde{X}) \subseteq \text{colspace}(X)=V_u$.
  • Moreover, if $v \in V_r$ then it is of the form $v=X\beta$ with $C\beta=0$, and thus $v' \tilde{X} = \beta' X' X (X'X)^{-1} C' = (C \beta)' = 0$, which shows $\text{colspace}(\tilde{X})$ is in the orthogonal complement of $V_r$, i.e. $\text{colspace}(\tilde{X}) \subseteq V_l$.
  • Finally, suppose $X\beta \in V_l$. Then $(X\beta)'(X\beta_0)=0$ for any $\beta_0 \in \text{nullspace}(C)$. This implies $X'X\beta \in \text{nullspace}(C)^\perp = \text{colspace}(C')$, so $X'X\beta=C'v$ for some $v$. Then, $X(X'X)^{-1} C' v = X\beta$, which shows $V_l \subseteq \text{colspace}(\tilde{X})$.

The more general case $\psi \ne 0$ can be obtained by slight modifications to the above proof. The fit of the restricted model would just be the projection $\tilde{P}_r$ onto the affine space $\tilde{V}_r = \{X \beta : C \beta = \psi\}$, instead of the projection $P_r$ onto the subspace $V_r =\{X\beta : C \beta = 0\}$. The two are quite related however, as one can write $\tilde{V}_r = V_r + \{X \beta_1\}$, where $\beta_1$ is an arbitrarily chosen vector satisfying $C \beta_1 = \psi$, and thus $$\tilde{P}_r y = P_r(y - X\beta_1) + X \beta_1.$$

Then, using the fact that $P_u X \beta_1 = X \beta_1$, we have $$(P_u - \tilde{P}_r) y = P_u y - P_r(y - X \beta_1) - X \beta_1 = (P_u - P_r)(y - X\beta_1) = P_l(y - X \beta_1).$$ Recalling $P_l = \tilde{X} (\tilde{X}'\tilde{X})^{-1} \tilde{X}'$ with $\tilde{X} = X(X'X)^{-1} C'$, the difference in sum of squares of residuals can be shown to be $$(y - X \beta_1)' P_l (y - X \beta_1) = (C \hat{\beta} - \psi)'(C (X'X)^{-1} C') (C\hat{\beta} - \psi).$$

lbelzile
  • 482
  • 3
  • 8
angryavian
  • 1,733
  • 12
  • 11