I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm()
or svyVGAM::svy_vglm
(for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer
accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long)
with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix()
has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer
and glmer
with variables used to calculate the weights as predictors. It happens that glmer
returns a lot of problems (convergence, high eigenvalues...), so I give another look at @ThomasLumley answer at this same question in stackoverflow(https://stackoverflow.com/questions/68333084/longitudinal-analysis-using-sampling-weigths-in-r) and other posts (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | Fitting multilevel models to complex survey data in R).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre
as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!