Analysing multiple-choice question with multiple answers

Question

I recently ran an experiment with 4 conditions where participants got a multiple choice task to select words best describing a subjective sensation. Now I'm struggling to figure out which analytical method I should use. There was no limit to how many of the 40 words they could select for each trial.

Essentially it looks like this:

Condition	Word 1 score	Word 2 score	Word 3 score [...]
A	174	41	77
B	193	67	112
C	171	69	78
D	157	54	98

This is across 30 participants and 1200 total trials. Mainly working in SPSS. The goal is to figure out whether one of the conditions features a greater preference to one or several words, and whether the preference is greater than the other conditions for the same word.

Any help is greatly appreciated.

Is every participant in every condition, or just one? – gung - Reinstate Monica Feb 23 '22 at 15:19 — gung - Reinstate Monica, Feb 23 '22 at 15:19

score 0 · Answer 1 · answered Feb 23 '22 at 23:38

At first glance, I don't see an ideal solution, but I can write how I would handle the situation myself.

First of all, we need a matrix of raw (not aggregated) data. It is a matrix with 120 rows (4 for each participant) and 42 columns. The first column is the participant (values $1,1,1,1,2,2,2,2,3,3,3,3...$), the second is the condition (values $A,B,C,D,A,B,C,D...$). The remaining columns correspond to the individual words and will contain zeros and ones, depending on whether the participant selected the word for the condition.

I would then compute a mixed effects logistic regression separately for each of the 40 words. The regressor in each model is the condition and the random factor is the participant. The random factor is needed because some people tend to choose many words and others few, and also different people might be expected to prefer different words. For each of these 40 models, calculate the statistical significance of the regressor condition.

Since we ran 40 tests, we have to adjust the significance level to avoid increasing the family-wise error rate. We can use the Bonferroni correction to adjust the alpha to $0.05/40 = 0.00125$. If the regressor condition is still statistically significant after this correction in any of the 40 models, you can examine under which condition the word appeared more frequently.

Two disadvantages of this procedure:

It is very tedious in SPSS (in R it would take one for loop).
It is highly conservative. If you assume only small differences exist, you probably won't find them.

Analysing multiple-choice question with multiple answers

1 Answers1