Over the course of 30 days I have asked 47 people (24 from group A and 23 from group B) which of four foods they prefer, making up a total of 1410 observations:
choice
group apple orange pizza beer
A 340 63 216 101
B 424 65 125 76
Because I have asked the same person multiple times, the observations (within each group) are not independent and I cannot use a chi-squared test to compare the distributions.
What I want to know is: Which foods are chosen significantly more often by one group than the other? My hypothesis is that group A prefers pizza and beer, while group B prefers fruits. I assume that the preference does not change over (such a short) time and am not interested in the longitudinal aspect of the survey.
What test can I use?
Attempt at a solution:
Basically, the repeated measures (of the same person) are something like repeatedly measuring the length of a stick to obtain a more accurate measurement and average out measurement errors. I therefore thought that for each person I might calculate the percentage of each answer category. Thus, 100% of answers of one person would then divide into, for example, 40% apple answers, 30% orange, 20% pizza, and 10% beer. Represented as probabilities (that sum up to 1 for each person), I would then have data like this::
person group apple orange pizza beer
1 A 0.4 0.3 0.2 0.1
2 B ...
In this way I would have "deleted" the within-person interdependence and would then perform a t-test on the resulting two numeric vectors.
But I am unable to judge whether this is a valid procedure for the kind of data I have. Also, I would prefer to use a published and reviewed test, if such a one exists.
Sample data:
food <- c("apple", "orange", "pizza", "beer")
dat <- data.frame(
group = rep(c("A", "B"), c(720, 690)),
choice = c(
rep(food, c(340, 63, 216, 101)),
rep(food, c(424, 65, 125, 76))
)
)
tab <- table(dat)