1

Suppose we have fitted a standard linear model with a categorical variable which has 3 levels A, B and C.

x <- factor(sample(c("A", "B", "C"), 200, replace = TRUE))
y <- rnorm(200)
fit <- lm(y ~ x)

How can we use this model to determine whether the coefficient for level $B$ = coefficient for level $C$?

Shana
  • 241
  • 2
  • 8

1 Answers1

1

You can perform two regression, one with separate values for $B$ and $C$, and another in which just observe $x$ being either $B$ or $C$

# individual variables for B as well as C
reg1 = lm(y ~ I(x == "B") + I(x == "C")) 
# either B or C
reg2 = lm(y ~ I(x %in% c("B", "C"))) 

Then you can do an ANOVA analysis (F-Test or chi square) to check whether the p-value suggests to separate $B$ and $C$

# F-Test
anova(reg1, reg2, test = "F")

In your example, the results look as follows

set.seed(1)
x <- factor(sample(c("A", "B", "C"), 200, replace = TRUE))
y <- rnorm(200)

# individual variables for B as well as C
reg1 = lm(y ~ I(x == "B") + I(x == "C")) 
# either B or C
reg2 = lm(y ~ I(x %in% c("B", "C"))) 

# F-Test
anova(reg1, reg2, test = "F")

# Results 
Analysis of Variance Table

Model 1: y ~ I(x == "B") + I(x == "C")
Model 2: y ~ I(x %in% c("B", "C"))
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1    197 200.82                           
2    198 200.87 -1 -0.051686 0.0507 0.8221

The p-value is far away from any conventional significance levels, suggesting that you can consider $B=C$ (but be aware that conclusions based on p-values are controversial).

Arne Jonas Warnke
  • 3,085
  • 1
  • 22
  • 40