How to get the degree of freedom when doing multiple linear regression?

Question

In the book An introduction to statistical learning with Applications in R, when testing the relationship between the Response and the Predictors using null hypothesis, it gives a formula: $$F = \frac{(TSS - RSS)/p}{RSS/(n-p-1)}$$ where $$RSS = \sum_{i=1}^{n} (y_i-\hat{y_i})^2$$ $$\hat{y}=\hat{\beta_0} + \hat{\beta_1}x_1 + \dotsb + \hat{\beta_p}x_p$$ $$TSS = \sum_{i=1}^{n} (y_i-\bar{y})^2$$

I googled the F test and learned that the degree of freedom in F statistic means number of values in the final calculation of a statistic that are free to vary.(https://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics), https://www.khanacademy.org/math/probability/statistics-inferential/anova/v/anova-3-hypothesis-test-with-f-statistic)

But, here I cannot get why in this formula the degree of freedom are p and (n-p-1)?

Gumeo · Accepted Answer · 2015-09-24T12:55:58.990

This is an interesting question.

There are $p+1$ parameters in your model, you are testing if your model is performing better than a model consisting of just the mean. Calculating the mean consists of calculating just one parameter.

You can use the F-statistic to compare nested models, note that two models are nested if one model contains all the terms of the other, and the larger contains at least one additional term. So the mean model is a nested model of the large model. The mean model is essentially the model $$ \hat{y} = \hat{\beta_0} $$

Let's look at the nominator. Generally this is: $$ \frac{(\text{smaller nested model with }p_1\text{ parameters})-(\text{large model with }p_2\text{ parameters})}{(n-p_1)-(n-p_2)} $$ In your case $p_2$ is $p+1$ and $p_1$ is simply 1. The parentheses with the text in this nominator represent the corresponding residual sum of squares.

Now the same goes for the denominator, it is $$ \frac{(\text{large model with }p_2\text{ parameters})}{(n-p_2)} $$

so $(n-p_2)= (n-(p+1))=(n-p-1)$. So your statistic is essentially a proportion of how well your model fits compared to a basic model w.r.t. the number of parameters.

These slides might also help you to understand this further.

How to get the degree of freedom when doing multiple linear regression?

1 Answers1

Linked

Related