I'm performing regression on data derived from a known underlying model with normally distributed errors, and I don't understand how the fitted regression coefficients can be as far as they are from the underlying model (coefficients 0, 1, -2 in underlying model, with coefficients -1.6, 6.2, and -23.9 in resulting regression). See below. I'm seeking help in understanding this phenomenon. By the way, the phenomenon is insensitive to random seed (although the exact coefficients do vary in that circumstance).
The data is generated in R as follows:
set.seed(1)
x=rnorm(100)
y = x - 2*x^2 + rnorm(100)
The regression is simply:
lm2=lm(y~poly(x,2))
summary(lm2)Residuals:
Min 1Q Median 3Q Max
-1.9650 -0.6254 -0.1288 0.5803 2.2700Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.5500 0.0958 -16.18 < 2e-16
poly(x, 2)1 6.1888 0.9580 6.46 4.18e-09
poly(x, 2)2 -23.9483 0.9580 -25.00 < 2e-16
Residual standard error: 0.958 on 97 degrees of freedom Multiple R-squared: 0.873, Adjusted R-squared: 0.8704 F-statistic: 333.3 on 2 and 97 DF, p-value: < 2.2e-16