A post "Fitting Polynomial Regression in R" used two ways to model the polynomial regression: (a) poly(..., ...)
; (b) I(...)
. Below is the example:
set.seed(20)
q <- seq(from=0, to=20, by=0.1)
y <- 500 + 0.4 * (q-10)^3
noise <- rnorm(length(q), mean=10, sd=80)
noisy.y <- y + noise
# fitting polynomials
# two methods
model_a <- lm(noisy.y ~ poly(q,3))
model_b <- lm(noisy.y ~ q + I(q^2) + I(q^3))
# their summary are all the same except the coefficients
summary(model_a)
summary(model_b)
The post said that:
q
,I(q^2)
andI(q^3)
will be correlated and correlated variables can cause problems. The use ofpoly()
lets you avoid this by producing orthogonal polynomials, therefore I’m going to use the first option (i.e.,poly()
).
I am confused that:
(1) Why does the q, I(q^2) and I(q^3)
cause problems?
(2) According to summary()
, these two models are all the same, except the Coefficients
. Why the coefficients are different, while others are the same? Shouldn't they all different, or all the same?