4

I went through this thread on degrees of freedom: How to understand degrees of freedom?, and the great answers in it, but then I read the following on Wikipedia on the article about regression:

Statistical assumptions

When the number of measurements, N, is larger than the number of unknown parameters, k, and the measurement errors εi are normally distributed then the excess of information contained in (N − k) measurements is used to make statistical predictions about the unknown parameters. This excess of information is referred to as the degrees of freedom of the regression.

Given this definition, if $N$ increases, the degrees of freedom increase as well, but intuitively that would make the problem more constrained (we have more information per parameter). Why is N-k then called degrees of freedom, and it isn't the other way around e.g. (k-N)?

Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110

1 Answers1

6

You may be confused between degrees of freedom attributed to different things.

We would not use negative numbers to count; but there are two sides to the ledger.

In common situations, the data degrees of freedom will be $N$, say.

The model degrees of freedom -- the degrees of freedom the model has to fit the data -- is $k$, and the residual degrees of freedom is what's left over: $N-k$. That $k$ may often be partitioned into various components of the model.

Any of them might be called "the" degrees of freedom depending on what, exactly, is being discussed.

Indeed, we use 'degrees of freedom' more broadly still, whence the appearance of noninteger degrees of freedom for some kinds of models, and references to things like "researcher degrees of freedom".

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • 1
    I really hope you could write a full-fledged answer for what's DF as every time people ask about what's DF, the mods just redirect to whuber's how to understand DF. It's surely an insightful answer but it is not very intuitive to audience who try to correctly understand what DF is, IMHO. – stucash Jul 25 '20 at 05:52
  • I'd be very happy for you to post an answer. If you write a very good one (and point me toward it, wherever it gets posted), I'll be seeking to award it a bounty. – Glen_b Jul 25 '20 at 06:13
  • Thanks for reverting back. And I actually was thinking out loud :) I understand your points above: a good answer is hardly not a combination of all above. After flipping many pages tagged `Degree of Freedom` on StatEx I guess I was a bit lost. I'll take back the phrase `full-fledged` and I am sure I can't provide a good enough answer that doesn't fall short of your expectation and is not prone to errors (if not a lot). You are right, after I failed to grasp what whuber's explained in his answer, I started doing my own research. I guess I have become relying too much on StatEx and got lazy :P – stucash Jul 25 '20 at 06:29
  • It's actually surprisingly difficult to write a simple, brief and clear explanation of degrees of freedom in statistics in the general sense, without being wrong somewhere. The source notion (in relation to the chi-squared distribution) is essentially geometric in nature - and relates to the dimension of a subspace - but the situation becomes much more complicated once you start doing some of the various things that crop up in statistics. You may be interested to discover that even in the relatively simple case of parameter estimation from already grouped data, that Pearson had df wrong ... – Glen_b Jul 25 '20 at 06:34
  • ... and it took some years before correct situation was widely understood for that basic case. Merely add the slight wrinkle that parameter estimates are based on ungrouped data and the situation becomes complicated enough that it's no longer clear that it even makes sense to talk about degrees of freedom except as approximations. – Glen_b Jul 25 '20 at 06:36
  • 1
    I'm not familiar with StatEx (is it a book, a website, or something else?) -- can you provide a link to the thing you were looking at in relation to degrees of freedom? (I tried a couple of search engines but only got links to information about *morphine* or *exchange rates*) – Glen_b Jul 25 '20 at 06:39
  • 1
    I am actually a few hours into writing some R utilities programmes for volatility analysis... I meant Cross Validated (which is apparently a sub-site of Stack Exchange where I mentally replaced `Stack` with `Stat`). I am terribly sorry. – stucash Jul 25 '20 at 06:43
  • I think the conversion with you here helped me gain some contextual information w.r.t `DF` as in why it is not an easy task to even search for a good explanation. `DF` almost always needs the correct context to be, I'd say meaningful. Almost every page on google I clicked on for `DF` has its own nuances in some way.. – stucash Jul 25 '20 at 06:47