At first glance, I don't see an ideal solution, but I can write how I would handle the situation myself.
First of all, we need a matrix of raw (not aggregated) data. It is a matrix with 120 rows (4 for each participant) and 42 columns. The first column is the participant (values $1,1,1,1,2,2,2,2,3,3,3,3...$), the second is the condition (values $A,B,C,D,A,B,C,D...$). The remaining columns correspond to the individual words and will contain zeros and ones, depending on whether the participant selected the word for the condition.
I would then compute a mixed effects logistic regression separately for each of the 40 words. The regressor in each model is the condition and the random factor is the participant. The random factor is needed because some people tend to choose many words and others few, and also different people might be expected to prefer different words. For each of these 40 models, calculate the statistical significance of the regressor condition.
Since we ran 40 tests, we have to adjust the significance level to avoid increasing the family-wise error rate. We can use the Bonferroni correction to adjust the alpha to $0.05/40 = 0.00125$. If the regressor condition is still statistically significant after this correction in any of the 40 models, you can examine under which condition the word appeared more frequently.
Two disadvantages of this procedure:
- It is very tedious in SPSS (in R it would take one for loop).
- It is highly conservative. If you assume only small differences exist, you
probably won't find them.