1

In a model with some offset variables (variables with pre-determined regression coefficients), do these variables consume degrees of freedom, with regard of e.g. information criteria?

Pro: You can improve your model by adding N offsets. It would be weird, if you didn't get a penalty for that.

Con: I'm thinking of regressing GDP or GDP per capita against, say, wages. (GDP per capita would be the offsetted model.) A priori, either model can fit better. So, you would just pick the model with a higher R^2, with no penalties for degrees of freedom?

1 Answers1

1

It doesn't. Very loosely speaking, degrees of freedom are about the moving parts. (If you want a more rigorous explanation, check the How to understand degrees of freedom? thread.) Offset has a constant value of a parameter, equal to 1, that is not estimated from the data. Yes, adding an offset can change estimates of other parameters of the model, and the predictions, but the offset by itself does not change if you change something about the model.

You can easily verify this by yourself by running a model without and with offset in your favorite statistical software:

> summary(lm(mpg~disp, data=mtcars))

Call:
lm(formula = mpg ~ disp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.8922 -2.2022 -0.9631  1.6272  7.2305 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
disp        -0.041215   0.004712  -8.747 9.38e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.251 on 30 degrees of freedom
Multiple R-squared:  0.7183,    Adjusted R-squared:  0.709 
F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

> summary(lm(mpg~disp, data=mtcars, offset=wt))

Call:
lm(formula = mpg ~ disp, data = mtcars, offset = wt)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.5575 -2.3261 -0.9849  1.8896  7.4938 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 28.000040   1.320115  21.210  < 2e-16 ***
disp        -0.048225   0.005058  -9.534 1.37e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.49 on 30 degrees of freedom
Multiple R-squared:  0.6904,    Adjusted R-squared:  0.6801 
F-statistic: 66.91 on 1 and 30 DF,  p-value: 3.942e-09
Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thanks! Then, is it cheating if you try different offsets or combinations of offsets to improve your model? – Dr_Informatics Jul 15 '21 at 10:03
  • @Dr_Informatics to the same degree as changing or tuning anything else about your model. If doing too much of it, you can end up cherry-picking, or p-hacking, and the model would be useless for inference. If you care about prediction only, its the same as tuning any other parameters, you can end up overfitting to the training, or validation, data, depending on how you asess the performance of the model after the changes. – Tim Jul 15 '21 at 12:18
  • that's a very nice way to put it. P value fishing came to my mind! Basically, if you are a total purist, you should count the number of models you do, and then do a multiple testing correction. Never heard of anyone doing so, though. – Dr_Informatics Jul 16 '21 at 08:14