4

I understand degrees of freedom as the number of things that can independently change. And typically, in coming up with the degrees of freedom, if you have n terms, then you just subtract out the number of things which aren't independent (the typical example being the n-1 degrees of freedom in the sample variance). My problem is that I'm not quite seeing how the SSR degrees of freedom (simple linear regression) is just "1" using the previous way of thinking about it. After a little bit of algebra, I can transform it as:

$SSR=\sum_{i=1}^{n}\left ( \hat{y}_i-\bar{y} \right )^2=\sum_{i=1}^{n}\left[\hat{\beta}_0+\hat{\beta}_1x_i- \left(\hat{\beta}_0+\hat{\beta}_1\bar{x} \right) \right]^2=\hat{\beta}^2_1\sum_{i=1}^{n}\left(x_i-\bar{x} \right )^2$

And, I do see that my $\hat{\beta}_1$ is the only estimator (hence one degree of freedom?), however, I'd like to be able to understand this in a similar context of how I understand the n-1 degrees of freedom of the sample variance, namely "adding up n terms, subtracting out the number of terms that aren't independent" to somehow get 1.

Eric
  • 425
  • 1
  • 6
  • 8
  • Where do you find an assertion that SSR has just one df? What *exactly* does it say? – whuber Sep 19 '14 at 14:39
  • This doesn't answer you question, but as far as I know the division by (n-1) rather than by (n) in the calculation of sample variance is to make the estimator unbiased; it doesn't have to do with degrees of freedom. – nrussell Sep 19 '14 at 14:40
  • @nrussell You're correct in that it makes the estimator unbiased. However, the DF are n-1 because we're estimating the true mean using the sample mean, hence one degree of freedom is lost. – Eric Sep 19 '14 at 14:44
  • @whuber It's from the ANOVA relationship SST = SSR + SSE: the sum of squares of the residuals and the sum of squares of the error add up to the total sum of squares (which I believe by algebra). In addition, the DF of the SSR + the DF of the SSE equal the total DF; the DF of the SST are n, the DF of the SSE are n-1. – Eric Sep 19 '14 at 14:46
  • Thanks, Eric. I had taken "SSR" in the sense in which you mean "SSE" and did not read the formula carefully. The terms being squared and summed in your SSR formula are *not* the residuals: they are the differences between the predicted values (which are based on two estimated parameters) and the grand mean of the response (based on one estimated parameter). This mischaracterization might be at the root of your question. – whuber Sep 19 '14 at 14:58
  • @whuber Actually, my apologies! I mistyped in my comment to you and should have said that the SSR were in fact the sum of squares due to the regression. – Eric Sep 19 '14 at 15:03
  • A somewhat abstract explanation is offered at http://stats.stackexchange.com/a/16931. – whuber Sep 19 '14 at 15:21
  • @whuber I appreciate the link and further ideas that it presents. But, I'm still *hoping* (maybe it's just not possible?) to conceptualize it in an analogous manner to how I think about the degrees of freedom for the variance formula (i.e. in that case, we have n terms, and 1 of them isn't independent since since x bar is estimating mu, and as a result we have n-1 DF) – Eric Sep 19 '14 at 15:53
  • I think one important message behind these algebraic explanations is that such a conceptualization eventually needs to be broadened. But in this case, it might be possible to think of the $\hat{y}_i$--*which all fall along a common line*--as reflecting just *two* independent pieces of information derived from the data: the intercept and slope. Thus, they enjoy just two "degrees of freedom." Their mean coincides with the grand mean $\bar{y}$, so subtracting that "takes away one degree of freedom." This is the point @rvl has already made in a previous comment. – whuber Sep 19 '14 at 15:58
  • @whuber I'm going to give rvl credit for the answer since he was the first to actually post it. However, your fleshing out of his answer above has been very helpful. – Eric Sep 21 '14 at 13:54

1 Answers1

3

I just think of it in terms of the fitted line having 2 parameters, the intercept and the slope. But the intercept doesn't count after we adjust for the mean (which we do in obtaining SStotal), so that leaves just one df for regression.

Russ Lenth
  • 15,161
  • 20
  • 53