3

I have dummy coded a categorical regression, and ran OLS to get parameter estimates, along the lines of:

$$ y= \left( \begin{array}{ccc} 1 & 0 &0\\ 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1&0&1\\ 1&0&1\\ .&.&.\\ \end{array} \right) \beta+\epsilon $$

which gives me $\beta_0$, $\beta_1$ and $\beta_2$. I want to do a joint hypothesis test for $\beta_1=\beta_2=0$ -- i.e. an $F$ test, in the style of an ANOVA.

I read somewhere the following formulae for joint hypothesis testing:

$$ t=\frac{1}{n}\beta^T\Sigma\beta $$ $$ p=F_{cdf}(\frac{1}{t},n,\mathrm{dfe}) $$

where $\Sigma$ is the covariance of the parameter estimates, $n$ the number of hypotheses (2 in this case?), and dfe is perhaps the number of rows in $y$ minus 2.

I am not good at algebra, and wondered,

  • is this right?
  • If so, can I just examine $[ \beta_1, \beta_2 ]$ ignoring $\beta_0$?
  • How can I obtain the covariance matrix of the parameter estimates $\Sigma$? I have googled "parameter covariance" and found this crossvalidated answer which looks v complex and I can't figure out how to do it with simple matrix ops. (My model actually has more columns than this).
Sanjay Manohar
  • 906
  • 9
  • 16

2 Answers2

4

The $\beta_1=\beta_2=0$ hypothesis can be tested with an F-test using the nested models approach.

1) Create two models:

  • Model 1: $y = \beta_0$
  • model 2: $y = \beta_0 + \beta_1 \times x_1 + \beta_2 \times x_2$

2) Fit both models to the data

3) Compute the residual sum of square $rss_1$ and $rss_2$ for model 1 and model 2.

4) Compute the F statistic:

$F = \frac{rss_1 - rss_2}{p_2 - p_1} \times \frac{n - p_2 +1 }{rss_2}$ where $p_1$ and $p_2$ are the number of parameters for model 1 and 2 ($p_1=1$ and $p_2=3$ in your case) and $n$ is the number of data points used to estimate the parameters.

5) Under the null hypothesis that model 2 does not provide a significantly better fit than model 1, $F$ will have an F distribution, with ($p_2−p_1$, $n−p_2+1$) degrees of freedom. Therefore the $\beta_1=\beta_2=0$ hypothesis can be rejected if $F$ is big enough.

For more details, look at this pdf

The following Python code provides a sampling distribution of the F-stat in the case of a linear relationship fitted with a linear model and a quadratic polynomial: http://pastie.org/pastes/10684701. The resulting sampling distribution of F (Blue) and the F-distribution (red) are in good agreement, as shown in the figure.

Sampling distribution of F (Blue) and F-distribution (red)

Edit: new pastie link: http://pastiebin.com/embed/57cde0be63103

Adrien Renaud
  • 670
  • 3
  • 8
3

In the special case where you have an orthogonal design $(X'X) = I$, you can square the t-statistics from standard regression output and sum them up to get an F statistic. Otherwise, this is difficult enough to not want to do by hand.

If you are using R, you can use vcov to get the variance-covariance matrix for a model and calculate F manually; but then, you could just use anova to test nested models as Adrien Renaud suggested, or linearHypothesis in the car package to construct the F-test as you found it in a text.

Neal Fultz
  • 528
  • 3
  • 6