Longitudinal analysis using sampling weigths in R

Question

I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.

For continuous outcomes, I understand that I can do

w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)

svyglm((post-pre) ~ group, w_data_wide)

and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].

However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.

So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.

I give below some data sets with binomial and multinomial outcomes:

library(data.table)
set.seed(1)

data_long <- data.table(
  id=rep(1:5,2),
  time=c(rep("Pre",5),rep("Post",5)),
  outcome1=sample(c("Yes","No"),10,replace=T),
  outcome2=sample(c("Low","Medium","High"),10,replace=T),
  outcome3=rnorm(10),
  group=rep(sample(c("Man","Woman"),5,replace=T),2),
  weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)

data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]

EDIT

As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at @ThomasLumley answer at this same question in stackoverflow(https://stackoverflow.com/questions/68333084/longitudinal-analysis-using-sampling-weigths-in-r) and other posts (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | Fitting multilevel models to complex survey data in R).

So, my question is now if a can use participants id as clusters in svydesign

library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))

                     Estimate Std. Error t value Pr(>|t|)  
(Intercept)         1.875e+01  1.000e+00  18.746   0.0339 *
groupWoman         -1.903e+01  1.536e+00 -12.394   0.0513 .
timePre             5.443e-09  5.443e-09   1.000   0.5000  
groupWoman:timePre  2.877e-01  1.143e+00   0.252   0.8431

and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.

Thank you once again!

The use of sampling weights implies that you wish to average over unlikes in a population, as opposed to conditioning on the weighting factor(s) using covariates. What about the problem makes marginalization appropriate? — Frank Harrell, Jul 12 '21 at 11:24
@FrankHarrell Sorry, I don't understand what you mean. Could you please explain? Thank you. — MDSF, Jul 12 '21 at 13:05
What is the goal of weighting? Why do you want to downweight some of the observations? What are you trying to estimate? Why not condition on covariates to estimate covariate-specific quantities? — Frank Harrell, Jul 12 '21 at 14:29
The use of weights makes the distribution of the sample closer to the population of interest. I've tried to adjust a linear model with weights and another without them but with the variables used to calculate the weights as covariates. The estimates are quite different, which I assume to be caused by the presence of some high weights (maximum of 8, after truncation). In face of that, I would assume that weighted (unbiased) estimates would be preferable — MDSF, Jul 12 '21 at 15:01
Why make the sample similar to the population, which increases variances of estimates instead of exposing real differences within the population? E.g.., if you are trying to get a weighted average over males and female to target the sex ratio in the population rather than the sex ratio in the sample, why not just provide separate estimates for males and for females? — Frank Harrell, Jul 12 '21 at 16:33

Longitudinal analysis using sampling weigths in R

0 Answers0