22

I start with my OLS regression: $$ y = \beta _0 + \beta_1x_1+\beta_2 D + \varepsilon $$ where D is a dummy variable, the estimates become different from zero with a low p-value. I then preform a Ramsey RESET test and find that i have some misspesification of the equation, i thus include squared x: $$ y = \beta _0 + \beta_1x_1+\beta_2x_1^2+\beta_3 D + \varepsilon $$

  1. What does the squared term explain? (Non-linear increase in Y?)
  2. By doing this my D estimate does not vary from zero any more, with a high p-value. How do i interpret the squared term in my equation (in general)?

Edit: Improving question.

seini
  • 351
  • 1
  • 2
  • 9
  • possible duplicate of [Why ANOVA/Regression results change when controlling for another variable](http://stats.stackexchange.com/questions/25605/why-anova-regression-results-change-when-controlling-for-another-variable) – Macro Mar 18 '13 at 13:15
  • 1
    Probable reason: $x_{1}^2$ and $D$ seem to explain the same variablility in $y$ – steadyfish Mar 18 '13 at 13:17
  • 3
    One thing that might help is to center $x$ *before* creating your squared term (see [here](http://stats.stackexchange.com/questions/29781/when-should-you-center-your-data-when-should-you-standardize/29782#29782)). As for the interpretation of your squared term, I argue that it's best to interpret $\beta_1x_1+\beta_2x_1^2$ *as a whole* (see [here](http://stats.stackexchange.com/questions/28730/does-it-make-sense-to-add-a-quadratic-term-but-not-the-linear-term-to-a-model/28750#28750)). Another thing is that you may need an interaction, that means adding $\beta_4x_1D+\beta_5x_1^2D$. – gung - Reinstate Monica Mar 18 '13 at 13:45
  • I don't think it's really a duplicate of that question; the solution is different (centering variables works here, but not there, unless I am mistaken) – Peter Flom Mar 18 '13 at 14:13
  • @Peter, I interpret this question as a subset of "Why is it that when I add a variable to my model, the effect estimate/$p$-value for some other variable changes?", which is addressed in the other question. Among the answers to that questions are collinearity (which gung does allude to in his answer to _that_ question)/content overlap between predictors (i.e. between $D$ and $(x_1,x_1^2)$, which I suspect is the culprit in this case). The same logic applies here. I'm not sure what the controversy is but that's fine if you and others disagree. Cheers. – Macro Mar 18 '13 at 14:27
  • @Macro I agree that colinearity is likely the problem here, but when the problem is caused by a squared variable, centering removes the problem. I don't think this works for two related variables (as in the other problem). Am I wrong? – Peter Flom Mar 18 '13 at 14:45
  • @Peter, since the answer is collinearity/content overlap, I think that makes it a subset of the other question. Fixes for collinearity may be context dependent but I don't think this makes it a different question. To address your comment directly - centering _may_ alleviate the problem but if $D$ (or $P(D=1)$) is a function of $x_1$, then it almost certainly will not, in which case you're even more closely back to the content conveyed in the linked question. I still don't see the controversy but we don't need to agree on this, so let's end the duplicate vs. not duplicate convo here. Cheers. – Macro Mar 18 '13 at 14:57
  • Macro & Peter are both correct. Our policy is to identify *close* duplicates; if there could be any difficulty deciding whether a question truly is a duplicate, then it's not close enough. However, the present question has been answered in many threads on this site: a little more diligence in searching is likely to produce much useful and relevant material. – whuber Mar 18 '13 at 15:08
  • **Very** closely related: [Adding both quadratic and interaction terms to the model affects significance](http://stats.stackexchange.com/questions/34488/adding-both-quadratic-and-interaction-terms-to-the-model-affects-significance)... – Macro Mar 20 '13 at 01:59
  • See my blog post for a simple step by step guide and how to interpret the age & age squared variable. The example follows the wage equation mentioned in the post above. http://www.excel-with-data.co.uk/blog-1/how-to-regression-analysis-in-excel/ – user34889 Nov 17 '13 at 13:06
  • At this time the link to the blog post just mentioned by @user34889 is no longer active, thus underlining frequent advice here to be wary of posting such links unless known to be stable. – Nick Cox Aug 07 '14 at 15:54

2 Answers2

24

Well, first of, the dummy variable is interpreted as a change in intercept. That is, your coefficient $\beta_3$ gives you the difference in the intercept when $D=1$, i.e. when $D=1$, the intercept is $\beta_0 + \beta_3$. That interpretation doesn't change when adding the squared $x_1$.

Now, the point of adding a squared to the series is that you assume that the relationship wears off at a certain point. Looking at your second equation

$$y = \beta _0 + \beta_1x_1+\beta_2x_1^2+\beta_3 D + \varepsilon$$

Taking the derivate w.r.t. $x_1$ yields

$$\frac{\delta y}{\delta x_1} = \beta_1 + 2\beta_2 x_1$$

Solving this equation gives you the turning point of the relationship. As user1493368 explained, this is indeed reflecting an inverse U-shape if $\beta_1<0$ and vice versa. Take the following example:

$$\hat{y} = 1.3 + 0.42 x_1 - 0.32 x_1^2 + 0.14D$$

The derivative w.r.t. $x_1$ is

$$\frac{\delta y}{\delta x_1} = 0.42 - 2*0.32 x_1 $$

Solving for $x_1$ gives you

$$\frac{\delta y}{\delta x_1} = 0 \iff x_1 \approx 0.66 $$

That is the point at which the relationship has its turning point. You can take a look at Wolfram-Alpha's output for the above function, for some visualization of your problem.

Remember, when interpreting the ceteris paribus effect of a change in $x_1$ on $y$, you have to look at the equation:

$$\Delta y = (\beta_1 + 2\beta_2x_1)\Delta x$$

That is, you can not interpret $\beta_1$ in isolation, once you added the squared regressor $x_1^2$!

Regarding your insignificant $D$ after including the squared $x_1$, it points towards misspecification bias.

altabq
  • 665
  • 3
  • 6
  • 16
  • Hi. If you had several predictors should you use partial derivatives or total derivatives (diferentials)? – skan Aug 24 '16 at 00:06
  • 1
    A partial derivative is still the right way to go here. The interpretation of all coefficients is *ceteris paribus*, i.e., holding everything else constant. That's exactly what you are doing when you take a partial derivative. – altabq Aug 26 '16 at 09:07
  • See this [UCLA IDRE page](https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqhow-do-i-interpret-the-sign-of-the-quadratic-term-in-a-polynomial-regression/) to complement @altabq's great answer. – Cyrille Oct 18 '18 at 07:22
21

A good example of including square of variable comes from labor economics. If you assume y as wage (or log of wage) and x as an age, then including x^2 means that you are testing the quadratic relationship between an age and wage earning. Wage increases with the age as people become more experienced but at the higher age, wage starts to increase at decreasing rate (people becomes older and they will not be so healthy to work as before) and at some point the wage doesn't grow (reaches the optimal wage level) and then starts to fall (they retire and their earnings starts to decrease). So, the relationship between wage and age is inverted U-shaped (life cycle effect). In general, for the example mentioned here, the coefficient on age is expected to be positive and than on age^2 to be negative.The point here is that there should be theoretical basis /empirical justification for including the square of the variable. The dummy variable, here, can be thought of as representing gender of the worker. You can also include interaction term of gender and age to examine the whether the gender differential varies by age.

Metrics
  • 2,526
  • 2
  • 19
  • 31