I am trying to fit a quadratic to my model, I have tuples (x,y).
The choices are,
1) lm(y~x+I(x^2))
2) lm(y~(x-mean(x))+I(x-mean(x))^2)
3) lm(y~(x-mean(x))+I(x^2 - mean(x^2)))
In other words, in 3, I am centering the quadratic term, using its own mean.
I do understand that centering to reduce multicollinearity is not an issue here. I am just looking to understand how to center in general. Intuitively 3) makes more sense, I am treating the linear and the quadratic vars as separate and just centering them in a usual way. 2 is odd because the quadratic term will also have a linear component once you open the squares up. 1 and 3 give the same coefficients which is different from 2, but there seems to be no relationship between the linear coefficient from 2 and 1. The quadratic coefficient is the same across all models.
The outputs are
model 1)
Call:
lm(formula = y ~ x + I(x^2))
Residuals:
Min 1Q Median 3Q Max
-73.845 -10.151 1.224 9.660 73.553
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 262.709845 82.982956 3.166 0.0016 **
x 0.150473 1.346574 0.112 0.9111
I(x^2) -0.002182 0.005459 -0.400 0.6895
model 2)
Call:
lm(formula = y ~ (x-mean(x)) + (x-mean(x))^2)
Residuals:
Min 1Q Median 3Q Max
-73.845 -10.151 1.224 9.660 73.553
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 247.263060 0.657972 375.796 <2e-16 ***
x -mean(x) -0.396789 0.080544 -4.926 1e-06 ***
(x -mean(x))^2 -0.002182 0.005459 -0.400 0.69
And model 3)
Call:
lm(formula = y ~ (x - mean(x)) + I(x^2 - mean(x^2)))
Residuals:
Min 1Q Median 3Q Max
-73.845 -10.151 1.224 9.660 73.553
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 247.138199 0.579052 426.798 <2e-16 ***
x - mean(x) 0.150473 1.346574 0.112 0.911
I(x^2 - mean(x^2)) -0.002182 0.005459 -0.400 0.690
Notice 1 and 3) give the same coefficient estimates and 2 is different for the coefficient on the linear term. The coefficient of the quadratic term all agree. The model 2 is significant for the linear term and the other ones are not, why?