The question I have is related to this one, but slightly more specific.
Suppose I use data from a large-scale educational assessment to investigate how students' proficiency in math depends on their background characteristic, such as gender and socioeconomic status. The data come from a study in which a low-stakes test was used to capture students' math proficiency. Now, performance on a low-stakes test has been shown to depend on both cognitive and non-cognitive factors, the example of the latter being conscientiousness. Hence, I want to include some proxy for non-cognitive skills in my analysis. Specifically, I want to estimate two models:
Model 1: math ~ gender + ses
and
Model 2: math ~ gender + ses + conscientiousness
Let's say I am interested in how the coefficient for gender differs across these two models. That is, if b1
is the estimate of the coefficient in Model 1, and b2
is the estimate of the coefficient in Model 2, I am interested in whether the difference b1 - b2
is statistically significantly different from 0. To answer this question, I need a standard error for the difference. This amounts to calculating the variance of b1 - b2
which is equal to Var(b1) + Var(b2) - 2 * Cov(b1, b2)
. The tricky part here is Cov(b1, b2)
, as the variances are obtained easily.
A simple method of estimating the covariance was proposed by Clogg et al. (1995). Mize et al. (2019) suggested that the covariance term can be estimated using seemingly unrelated regression. I am not sure how to apply these method to my problem, given that:
- the data I use come from a complex, multi-stage clustered sample. One has to apply survey weights and a set of replicate weights to obtain correct points estimates and standard errors. Specifically, to fit the regression models, I use
survey::svyglm()
with asurvey::svrepdesign()
survey design object. - proficiency in math is not represented by a single score but by a set of 10 plausible values. After the models for each plausible values are estimated separately, I aggregate the results using Rubin's aggregation rules.
In light of these complexities, how can I test the difference b1 - b2
for significance?