Does parametrized Box Cox transform take degrees of freedom away from subsequent models?

Question

The Box-Cox transform has two parameters that equate to a shift $\alpha$ and a power $\lambda$. Implementations such as scipy.stats.boxcox have the option of either being given $\lambda$ or finding an optimal choice of $\lambda$ by minimizing the negative log-likelihood of the transformed variables using a normal distribution.

Let says I have two variables $X$ and $Y$. I would like to perform a regression using function $f$ between their Box-Cox transforms $T(X; \lambda_1)$ and $T(Y; \lambda_2)$ having optimized $\lambda_1$ and $\lambda_2$ toward the transformed variables being normal.

Whether I train the Box-Cox parameters simultaneously with $f$ or perform the Box-Cox optimization and then perform the regression of $f$, have I influenced the degrees of freedom of my model?

Related: https://stats.stackexchange.com/questions/40779/box-cox-transforms-for-regression — DifferentialPleiometry, Dec 24 '21 at 06:55
It depends on what you mean by "degrees of freedom" and how you intend to use this quantity in follow-on calculations. Could you explain? — whuber, Dec 24 '21 at 16:50
@whuber I was hoping that by taking a [descriptivist approach](https://en.wikipedia.org/wiki/Linguistic_description) rather than a [prescriptivist approach](https://en.wikipedia.org/wiki/Linguistic_prescription) that I might learn something new about degrees of freedom through this question. I have a bulding worry that DF are just *ad hoc* scores. We already have a lot of "what are degrees of freedom?" questions on stats.SE, so I was aiming for something more subtle. Your questions exactly reflect my concern. I guess I achieved writing an obscure question instead of a subtle one. — DifferentialPleiometry, Feb 03 '22 at 17:53
The short answer, based on maximum likelihood theory, is that each Box-Cox parameter eats one D.F. For an explicit discussion of this in the context of logistic regression, a classic paper is Royston & Altman, *Regression Using Fractional Polynomials of Continuous Covariates...* Appl. Stat. (1994) **43** ,No.3, pp.429-467. Find it in pdf form at https://rss.onlinelibrary.wiley.com/doi/pdf/10.2307/2986270 . — whuber, Feb 03 '22 at 18:11

score -1 · Answer 1 · edited Dec 24 '21 at 11:46

I'm inclined to say "yes", although it is not clear by how much the degrees of freedom have been effected. I imagine those familiar with the concept of "phantom degrees of freedom" would categorize this as such.

The reason is because the data are used to estimate a parameter (the Box-Cox parameter), and we do not account for that use directly. Were we to obtain new data from the same data generating mechanism, it is plausible that (because of sampling variability) we would get a different parameter value. That uncertainty is not accounted for in most modelling frameworks.

Does parametrized Box Cox transform take degrees of freedom away from subsequent models?

1 Answers1