2

There is a good explanation of degrees of freedom elsewhere. But which heuristic applies in the following is not so clear from that explanation. There is also a different question elsewhere.

Is there any general method of quantifying a fractional degree of freedom when one or more parameters is constrained? An example problem, for which a solution could also be provided as an answer is provided for clarity as follows.

Suppose we have a sample from a normal distribution for which the standard deviation is constrained,

$$\mathcal{N}\big(\bar{x},[c-\Delta, c+\Delta]\big)\;\;,$$ where $\bar{x}$ is the sample mean, $\Delta$ is a positive definite real number, and $c$ is a constant. Such a situation might arise during regression, where for some reason it is desirable to prevent $c$ estimates of standard deviation that appear to be outliers. It is clear that as the $\Delta \to\infty$, that the sample distribution estimator becomes unconstrained, i.e., $c\to s$ and thus the sample distribution in the limit as $N\to\infty$ becomes the population normal distribution $\mathcal{N}(\mu,\sigma)$. Therefore, for $\Delta \to\infty$ we have 2 degrees of freedom. On the other hand, for $\Delta\to0$, we have $\mathcal{N}\big(\bar{x},c\big)$, and when $c\neq s$ we would from regression just not have as good a fit to the data as when $c=s$. In that case there is only 1 degree of freedom. When $\Delta$ is some definite positive number, and $N$ an integer greater than 1, then the degrees of freedom are between 2 and 1, i.e., not a whole number. What is that number? Moreover, what I would really like is an answer applicable to any density function with any constraints on parameters, if possible. For example, how many degrees of freedom would one use for adjusted R-Squared, for AIC, and so forth?

Carl
  • 11,532
  • 7
  • 45
  • 102
  • 1
    (+1) I suspect most inferences that make use of "degrees of freedom" will rely on asymptotic approximations that require parameter estimates not to lie on a boundary of the parameter space. – Scortchi - Reinstate Monica Jan 23 '22 at 11:48
  • @Scortchi-ReinstateMonica In the example above, as $\Delta$ becomes smaller and $N$ is small making $s$ large, if may be that all the values are at the boundaries. What would you think about removing the constraint, identifying the density function for $s$ from multiple trials, e.g., bootstrap, find a probability from the interval of the density function, and use that as a fractional $df$? – Carl Jan 23 '22 at 13:22
  • 1
    Rather than having a model containing "overrides" it is probably better to formulate a more general and robust model. This would also allow you to avoid to temptation of looking for outliers and doing something non-reproducible with them. Consider for example a $t$ distribution for the raw data, with unknown degrees of freedom. This is perhaps better carried out using a Bayesian method as exemplified in the discussion about the Bayesian $t$-test [here](https://hbiostat.org/bbr). – Frank Harrell Jan 23 '22 at 13:39
  • @FrankHarrell Not applicable to the question. I simplified so as to make the question as clear as possible. FYI, the actual distribution I am using is fitting censored data from the [gamma-Pareto (type I) distribution](https://stats.stackexchange.com/a/445372/99274). And the reason for doing it is because the $\beta$ parameter occurs in a region much earlier than the data occurs. – Carl Jan 23 '22 at 13:51

0 Answers0