EDIT2: Possibly the short version of the question is just how would I test cat2 = cat3 in the STATA example in the link below
I would like to test the hypothesis that $\beta_{typeprof} = -\beta_{typewc}$. On the first glimpse this seems a pretty straight forward thing to do in R using the linearHypothesis
function in the car
package. However, the presence of intercepts and multiple (categorical) variables made the interpretation a bit trickier to me.
Here's my reproducible example:
library(car)
# no intercept present
mod.duncan0 <- lm(prestige ~ 0 + income + education + type,
data=Duncan)
linearHypothesis(mod.duncan0, "typeprof=-typewc")
will return
Hypothesis: typeprof + typewc = 0
Model 1: restricted model
Model 2: prestige ~ 0 + income + education + type
Res.Df RSS Df Sum of Sq F Pr(>F)
1 41 3798.8
2 40 3798.0 1 0.86657 0.0091 0.9244
so we clearly cannot reject the H0. However, as far as I understood the linearHypothesis
method after running debug linearHypothesis.default
, the hypothesis matrix/formula needs to be adjusted in order to test the same thing with an intercept (see also this STATA related discussion here):
mod.duncan1 <- lm(prestige ~ income + education + type, data=Duncan)
linearHypothesis(mod.duncan1,
"typeprof +(Intercept) = -typewc-(Intercept) ")
which will return exactly the same. Now assume, still using reference coding, I add another categorical variable to the mix. the Duncan dataset doesn't come with one, so I made one BS variable up.
set.seed(123)
Duncan$bs <- as.factor(rbinom(45,4,.4))
mod.duncanM0 <- lm(prestige ~ 0+income + education + bs + type,
data=Duncan)
Now the questions is:
Can I -- just by leaving a general intercept out -- meaningfully test $\beta_{typeprof} = -\beta_{typewc}$, even though there are BS specific intercepts?
Note, this is a made up example and understand that testing this hypothesis doesn't make much sense with this dataset, but in the original data type is variable that has clearly a neutral reference and a positive and negative category. Btw: here's a summary of the model itself, just for the sake of completeness and to see the estimates themselves:
As you can see, using reference coding, all of the BS categories are included while type leaves one category (typebc) out. Does this hamper the desired interpretation in any way?
EDIT: This question and particularly @gung 's answer are related.