Results of lm() function with a dependent ordered categorical variable?

Question

I am trying to understand Ben Bolker's answer to this question.

First, we create a data frame:

set.seed(101)
d <- data.frame(x=sample(1:4,size=30,replace=TRUE))
d$y <- rnorm(30,1+2*d$x,sd=0.01)

Then Mr. Bolker says:

x as ordered factor

coef(lm(y~ordered(x),d))
##  (Intercept) ordered(x).L ordered(x).Q ordered(x).C 
##  5.998121421  4.472505514  0.006109021 -0.003125958

Now the intercept specifies the value of y at the mean factor level (halfway between 2 and 3); the L (linear) parameter gives a measure of the linear trend (not quite sure I can explain the particular value ...), Q and C specify quadratic and cubic terms (which are close to zero in this case because the pattern is linear); if there were more levels the higher-order contrasts would be numbered 5, 6, ...

My question is, what does the regression formula look like explicitly?

I thought lm() makes a model like this:

y = 5.9981 + 4.4725 (x_1) + 0.0061 (x_2) - 0.00312 (x_3)

where, since the x_i are categories, they can only be either 0 or 1.

I do not understand what quadratic and cubic terms have to do with a linear model. Even so, squaring/cubing any of the variables would not make a difference, since 0 ^ 3= 0 and 1^3 = 1.

No answers yet, and I don't have time to write a full one, but as a starting place, see `contr.poly(4)` for the numerical definition. — Aaron left Stack Overflow, Apr 09 '18 at 00:16
and compare to the default `contr.treatment(4)`; you'll see that while the default is to code with just 0/1, the polynomial contrasts are not simply 0/1. — Aaron left Stack Overflow, Apr 09 '18 at 01:39

score 13 · Answer 1 · answered Nov 20 '18 at 05:01

In this case, what lm() is doing is converting your "categorical" variable into a numeric sequence in order.

To make this clearer, I'll adapt Bolker's code a bit to make the X variable more obviously categorical:

set.seed(101)
d <- data.frame(x=sample(1:4,size=30,replace=TRUE))
d$y <- rnorm(30,1+2*d$x,sd=0.01)
d$x = factor(d$x, labels=c("none", "some", "more", "a lot"))
coef(lm(y~x, d))
#  (Intercept)       xsome       xmore      xa lot 
#     3.001627    1.991260    3.995619    5.999098

So here, the mean of x=None is in the intercept, and the deviation from that is indicated for each category.

coef(lm(y~ordered(x), d))
#  (Intercept) ordered(x).L ordered(x).Q ordered(x).C 
#  5.998121421  4.472505514  0.006109021 -0.003125958

Conceptually, what's happened here is that the ordered() function converted x into newx using (something similar to):

if (x=="None") newx=-.67
if (x=="some") newx=-.22
if (x=="more") newx=.22
if (x=="a lot") newx=.67

and then it fitted (something like) the model: $$y = a + b_0 \times newx + b_1 \times newx^2 + b_2 \times newx^3$$

where you have linear $newx$, quadratic $newx^2$, and cubic $newx^3$ components.

Note, I said that it's something like that, because the problem with the model described there is that $newx$, $newx^2$, and $newx^3$ are not at all independent. What lm() does instead is uses a set of contrasts generated by contr.poly(4). These contrasts ensure orthogonality, so that the linear, quadratic and cubic components are independent. But the principle is similar - when fitting ordered factors, lm() fits a linear, quadratic, cubic, etc... component.

You can see this by comparing:

coef(lm(y~ordered(x), d))
#  (Intercept) ordered(x).L ordered(x).Q ordered(x).C 
#  5.998121421  4.472505514  0.006109021 -0.003125958

with

contrasts(d$x) <- contr.poly(4)
coef(lm(y~x, d))
#  (Intercept) ordered(x).L ordered(x).Q ordered(x).C 
#  5.998121421  4.472505514  0.006109021 -0.003125958

Exactly identical. So if you want a fuller understanding of what happened, take a closer look at contr.poly() and orthogonal polynomial contrasts in general.

One thing to note is that there is an implicit assumption hidden in here: the difference between each two levels is assumed to be equal. So "None" is as far from "some" as "some" is from "more", and "more" is from "a lot".

Thanks. Where did the -.67, -.22, .22, .67 values for newx come from? And why is taking the higher-order polynomials the right thing for studying order? — dfrankow, Jun 09 '20 at 21:16
Great answer, and good follow up question, had the same one. The -.67, -.22, .22, .67 values seem to come from the .L part of contr.poly(4) > contr.poly(4)[,".L"] [1] -0.6708204 -0.2236068 0.2236068 0.6708204 — Magnus, Dec 01 '20 at 09:39

Results of lm() function with a dependent ordered categorical variable?

x as ordered factor

1 Answers1

Linked