Test that a group of probabilities is different from chance - in either direction

Question

Suppose I have subjects sort 10 images into the categories "Group A" or "Group B". I want the null hypothesis to be that subjects are randomly assigning the images, and the alternate hypothesis that certain images tend to be assigned to certain categories. Importantly, I do not have an a priori hypothesis about the category of a given image.

How would you test whether the probability of assigning an image to Group A is different from 50%? I.e., for one image, it could be 30% chance of assignment to Group A, for another, it could be 70% chance of assignment to group A, and I would want to treat those as equally-powerful pieces of evidence for the alternative hypothesis.

My initial thought was to do a chi squared test of homogeneity, but such a test would be punished as the number of images increases, whereas it seems intuitively that my chosen test should become more powerful the more images I use.

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

Conducting a $\chi^2$ test is totally appropriate. Your last sentence:

My initial thought was to do a chi squared test of homogeneity, but such a test would be punished as the number of images increases, whereas it seems intuitively that my chosen test should become more powerful the more images I use.

can be interpreted a few different ways. One way to do the test would be to have, say 30 subjects conduct your experiment. Then simply bin the observations into either A or B. Which would result in the table.

 A   B
-------
100 200

The expected number for each bin would be $n \cdot 0.5 = 300 \cdot 0.5 = 150$, and so we see that (in R code), this example one would reject the null where each bin has equal probability.

dat <- c(100,200)
chisq.test(dat)

Conducting this test would be more powerful if you gave the same number of subjects more images. Another way to conduct the test though would be to create a 10 by 2 table, where each row is for a different image. e.g.:

 Image   A   B
 --------------
 1       6   4
 2       etc...
 3
 4
 5
 6
 7
 8
 9
10

This approach has the advantage that you can examine the residuals from the table and see if any particular image is more likely to be classified into the A or B category. Since you fix the number of images shown, to correspond to the conservative rule of thumb that the expected value for any cell should be at least 5, all you need to do is to conduct your experiment on at least ten people. I'm not sure if this approach gains power to reject the null with more images, as you are adding rows to the table - it would take more investigation. (I would guess no for a very low number of people, but after say 20 people I would guess more images does increase the power.) You may also consider Fisher's exact test on such a table (although I presume the test statistic would need to be estimated via simulation).

You can do the same type of "x by 2" table for people as well, in which case each row is a person. This has the same exploratory advantage in which you can see if any persons are more likely to classify images in the A or B category. This approach will increase in power with the more images you show to persons. And finally you may consider a logistic regression model predicting the categories based on individual or image random effects. This last suggestion requires the largest sample size, but gains in power both when increasing persons and increasing images.

The problem with doing the chi squared test the first way is that I don't necessarily expect the total counts to be different. In your example, the total counts of A and B were 100 and 200, respectively. For an extreme example, imagine showing 4 images, and every single subject categorizes images 1 and 2 into A, and images 3 and 4 into B. The overall counts of A and B would be the same, but subjects are clearly categorizing non-randomly. — Nicholas Root, May 25 '15 at 17:54
That is the reason I mention the second table @NicholasRoot (and subsequently the same table with people on the margins - as there may be some people who classify everything into a single category). — Andy W, May 25 '15 at 18:34
Oh, I figured out what was wrong with my thinking - I didn't realize that, in your second table, with a given effect size the chi squared test value would rise faster than the chi squared critical value as you increase rows. Got it, thanks! — Nicholas Root, May 25 '15 at 18:42

Test that a group of probabilities is different from chance - in either direction

1 Answers1