People are randomly assigned to participate in multicultural training (treatment) or a control activity. Weeks later, they are given a list of behaviors that could benefit multiple groups: Black Americans, Asian Americans, gays and lesbians, immigrants, and Muslims. They are asked to select all activities in which they would be interested in participating.
What is the best way to analyze the effect of condition on this dependent variable? I can think of a few ways:
1. Run five logistic regression models. Using this approach, I would treat each behavior as a binary variable (1 = yes, want to participate; 0 = no, don't want to participate) on their own. This is straightforward, but the immediate problems are: first, each of these five outcome variables are related to one another, but I treat them as completely separate in this analysis; second, inflated Type I error rate due to multiple comparisons. I could adjust p-values at this point, but I generally find these methods unsatisfying and difficult to choose between one of the many possible approaches. This also doesn't allow the models to "share information" with one another, when they should—because participation for different behaviors are likely to depend on one another.
2. Run a Poisson or negative binomial model. This involves creating a count of how many behaviors the participant would like to partake in, where they can score from 0 (selected none) to 5 (selected all of them). I do not wish to use this approach, because I want to know the granularity of a specific level—not just how many overall they selected. I also do not wish to equate selecting behaviors benefitting Black and Asian Americans (count = 2) with Muslims and immigrants (count = 2, also).
3. Fit a multilevel model. This involves nesting all five variables within an individual. I define a dummy-coded variable at Level 1 (within-person) denoting the target group (e.g., Black, Asian, Muslim, etc.) and another dummy-coded variable at level 2 (between-person) denoting what condition they were in. I define a random slope and intercept within-person. The model looks like:
$Y_{ij} = \beta_{0j} + \beta_{1j}X_{ij} + \epsilon_{ij}$
$\beta_{0j} = \gamma_{00} + \gamma_{01}Z_j + u_{0j}$
$\beta_{1j} = \gamma_{10} + \gamma_{11}Z_j + u_{1j}$
Where $X$ would actually be a 4 dummy-coded variables—I have just left them off for brevity here—and $Z$ represents assignment to condition. As an lme4
formula, this is:
glmer(participate ~ group * condition + (1 + group | id), data, family = binomial)
However, I am not having good luck at getting this model to converge. At about 300 participants and 5 observations per person, I do not believe N is an issue. I'm not sure if there is a particularity about my data that is leading to convergence problems, or if there is a general issue with this model that I am overlooking. I feel as if the model is having a hard time converging on individual estimates for the effect on a ${0, 1}$ outcome for each person (where this might not be as much of an issue if I were using a Gaussian link function with a continuous outcome).
4. Some type of extended chi-square table approach? There is an R package called MCRV
, but it seems like this is focused more on examining relationships between multiple "select all that apply" variables, not looking at the experimental effect of one variable on a multiple response categorical variable.
What is the best (with some justification) way to analyze the effect of an experimental condition on a "select all that apply" multiple response categorical variable?
I have seen similar questions asked on CrossValidated, but I have not found the answers to be very helpful (see How to analyse a "Check all that Apply" question, How to test for group differences in a 'select all that apply' question).