I have explanatory variable age in the model (along with others) and there are several theoretical justification for using square of age variable (along with age) in the model,e,g,here (if we have a dependent variable income, then this means that income increases as age increases but this increase becomes less as we become older). Now, I checked the correlation between age and age squared and it is more than 95 % (0.987). I think this is not surprising. My question is should we still include age squared along with age as explanatory variables in the model.
Asked
Active
Viewed 3,141 times
0
-
2If instead of using $\text{age}^2$ you introduce the functionally equivalent $(\text{age} - \text{mean age})^2,$ what correlation do you find with $\text{age}$? :-) – whuber Nov 12 '13 at 23:17
-
It's 0.19. Does this have any relevance? – user227710 Nov 12 '13 at 23:23
-
1Including squares of regressors is a long and common practice, especially in income/labor market econometrics -nowadays most writers do not even feel the need to discuss it in their papers. – Alecos Papadopoulos Nov 12 '13 at 23:48
-
Exactly Alecos. My obsevation is that they even do not discuss the multicollinearity issue associated with its inclusion, which is quite surprising. – user227710 Nov 13 '13 at 00:18
-
2To generalize whuber's excellent suggestion (which indeed does have relevance), orthogonal polynomials are commonly used to deal with the collinearity issue ... by completely removing it. – Glen_b Nov 13 '13 at 00:31
-
Thank you Gen_b. I see the point, but that is not commonly used in the literature. – user227710 Nov 13 '13 at 00:48
-
1Re "the multicollinearity issue": there *is* no multicollinearity when you set up the data correctly (such as by following @Glen_b's suggestion to use an orthogonal polynomial). You saw that when you computed a correlation coefficient of $0.19$: that's practically no correlation at all. To determine whether squared age matters in your model, study the extent to which it helps explain the *response* rather than its relationship to the other explanatory variables. A full explanation appears at http://stats.stackexchange.com/a/28493; to apply it here, let $X_1$ be age and $X_2$ be squared age. – whuber Nov 13 '13 at 14:41