I am trying to understand Ben Bolker's answer to this question.
First, we create a data frame:
set.seed(101)
d <- data.frame(x=sample(1:4,size=30,replace=TRUE))
d$y <- rnorm(30,1+2*d$x,sd=0.01)
Then Mr. Bolker says:
x as ordered factor
coef(lm(y~ordered(x),d))
## (Intercept) ordered(x).L ordered(x).Q ordered(x).C
## 5.998121421 4.472505514 0.006109021 -0.003125958
Now the intercept specifies the value of y
at the mean factor level (halfway between 2 and 3); the L
(linear) parameter gives a measure of the linear trend (not quite sure I can explain the particular value ...), Q
and C
specify quadratic and cubic terms (which are close to zero in this case because the pattern is linear); if there were more levels the higher-order contrasts would be numbered 5, 6, ...
My question is, what does the regression formula look like explicitly?
I thought lm()
makes a model like this:
y = 5.9981 + 4.4725 (x_1) + 0.0061 (x_2) - 0.00312 (x_3)
where, since the x_i
are categories, they can only be either 0 or 1.
I do not understand what quadratic and cubic terms have to do with a linear model. Even so, squaring/cubing any of the variables would not make a difference, since 0 ^ 3= 0 and 1^3 = 1.