0

I read that linear regression has n-k degree of freedom

Does that mean that for example in y=ax+b in a set of points if I want some specific a and b so 2 points don't have freedom to take whatever value (so in other words if I have a set of points and I want to change the regression line how I want I should add two points and not just one)?

2 Answers2

0

A common rule of thumb is that you need ten points for every coefficient.

However, this is not always good advice. A linear regression will not give stable coefficients if any of the components of x are linearly correlated. So interpreting the coefficients should be done with caution.

A simple way to check the stability of your results is to select a few random subsamples of your data, and see how much the reported coefficients vary.

chrishmorris
  • 820
  • 5
  • 5
0

Welcome here, this question has been asked in a slighlty different way here

In the regression context it means that if you have two points ($n = 2$), you can estimate at most two parameters $k = 2$ (the intercept $b$ and the slope $a$). Estimating you both parameters leaves you with exactly $n - k = 2 - 2 = 0$ degrees of freedom. This, in turn, means you cannot calculate standard error, p-values (or, for example, conduct a training-test-split) to estimate an uncertainty measure of your simple linear regression model. The issue is that the (unbiased) standard error of the regression is calculated as $s = \sqrt{\frac{\hat\varepsilon ^\mathrm{T} \hat\varepsilon}{n-k}}$. You can also not perform hyperparameter tuning etc.

Arne Jonas Warnke
  • 3,085
  • 1
  • 22
  • 40