How can my regression coefficients be so far from the underlying model?

Question

I'm performing regression on data derived from a known underlying model with normally distributed errors, and I don't understand how the fitted regression coefficients can be as far as they are from the underlying model (coefficients 0, 1, -2 in underlying model, with coefficients -1.6, 6.2, and -23.9 in resulting regression). See below. I'm seeking help in understanding this phenomenon. By the way, the phenomenon is insensitive to random seed (although the exact coefficients do vary in that circumstance).

The data is generated in R as follows:

set.seed(1)
x=rnorm(100)
y = x - 2*x^2 + rnorm(100)

The regression is simply:

lm2=lm(y~poly(x,2))
summary(lm2)

Residuals:
Min          1Q        Median      3Q       Max
-1.9650 -0.6254 -0.1288 0.5803 2.2700

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.5500 0.0958 -16.18 < 2e-16
poly(x, 2)1 6.1888 0.9580      6.46 4.18e-09
poly(x, 2)2 -23.9483 0.9580 -25.00 < 2e-16

Residual standard error: 0.958 on 97 degrees of freedom Multiple R-squared: 0.873, Adjusted R-squared: 0.8704 F-statistic: 333.3 on 2 and 97 DF, p-value: < 2.2e-16

`poly(x,2)` uses a different parameterization to regressing on `x`, and `x^2` (which in R you'd have to isolate with `I(x^2)` in the regression formula). If you plot the data and the fitted model (`oo=order(x)` and `lines(x[oo],fitted(lm2)[oo],col=4)`), you'll see it fits just fine. Thanks for the reproducible example, that helps a lot, because then I see what you have. — Glen_b, Oct 18 '15 at 05:54
Yes, I did plot the data and saw that there was good fit. However, I was confused about the coefficient estimates. I subsequently tried with the correct nomenclature and got the right ones. Thank you very much for setting me straight, @Glen_b — ClarPaul, Oct 30 '15 at 16:46
Glen_b, in reading @Bill's answer to http://stats.stackexchange.com/questions/95939/how-to-interpret-coefficents-from-a-polynomial-model-fit, I noticed that the F-statistic for the model was 12.01, regardless of whether orthogonal polynomials were used or not. Does that call into question the value of using orthogonal polynomials? Do you have any advice on when they need to be used and when they are superfluous? Many thanks, in advance. — ClarPaul, Oct 30 '15 at 17:00
Aside from the small effect of numerical error, orthogonal polynomials give exactly the same fitted values. However, with high order polynomials or if the x-values have a small range far from 0 (or worse, both at the same time) then the predictors become highly multicollinear, which carries a number of issues; orthogonal polynomials are by design not at all collinear. It also means if you want to compare different orders of polynomial (what would I get if I fitted linear,quadratic,cubic?) the orthogonal set up gives you all of them in one fit ... ctd — Glen_b, Oct 30 '15 at 23:15
ctd... (estimates, standard errors and sums of squares of the lower order models can be seen in the higher order fit). I tend to use both at different times. In the old days when the regression algorithms used in some packages were often less numerically stable, I'd always use orthogonal polynomials if available (and if not, I'd at least scale by hand to greatly reduce the multilcollinearity; replacing x's with approximate z-scores was nearly always sufficient for polynomials up to order 6 or so and it makes little sense to use such high order polynomials for anything) — Glen_b, Oct 30 '15 at 23:16

score 2 · Answer 1 · answered Oct 17 '15 at 20:30

2

When you use

poly(x,2)

it creates orthogonal polynomials. To estimate the true coefficients, use

lm(y~x+I(x^2))

answered Oct 17 '15 at 20:30

GuestHouse

21
1

Thank you, @GuestHouse. I used your suggestion and got good coefficient values. – ClarPaul Oct 30 '15 at 16:47

How can my regression coefficients be so far from the underlying model?

1 Answers1