0

I am wondering why in linear regression, if you have a categorical variable like months of the year, and you represent it by dummy variables, such as: $X_1 = 1$ if it is January, $X_2 = 2$ if it is February, $\ldots$, $X_{12}=1$ if it is Decemeber, why dropping one variable for the sake of collinearity doesn't make you lose any information?

How does the linear regression model "know" that you are dropping one variable?

user321627
  • 2,511
  • 3
  • 13
  • 49
  • 1
    Suppose I tell you that in spring the average temperature is 14 °C, in summer the average temperature is 24 °C, in autumn the average temperature is 12 °C, and the average temperature over the whole year is 13 °C. Can you tell me what the average temperature is in winter? – Cyan Oct 06 '17 at 00:16
  • 1
    If it's not January, not February, March ... etc through to November, ... *what month is it*? If you know what month it is, what extra information is contained in a variable that states"it's December"? You already have all that information. – Glen_b Oct 06 '17 at 04:04
  • Your questions have already been answered in the comments above. However, I add this link: https://stats.stackexchange.com/questions/144372/dummy-variable-trap that also addresses this issue, both to tell you how it is called ("dummy variable trap") and to highlight the first paragraph of the first reply there, that also fits your case. – Federico Tedeschi Nov 05 '17 at 22:09

0 Answers0