Which statistical test should be used for comparing two discrete/count data?

Question

Supposing that in an experiment there are two groups: Group A and Group B, each consisting of 30 participants.

The participants are required to play a game for ten rounds. In each round, the participants may or may not exhibit a special behavior. By the end of the experiment, we count how many rounds in which a participant has exhibited this behavior out of the ten rounds. As such, the variable is discrete and can only be an integer from 0 to 10.

Now, I would like to compare if there is a difference between Group A and Group B in this variable. What statistical method should I use in this case?

I initially wanted to use ANOVA but it seems only applicable to continuous data. I tried to Google search "statistical analysis on discrete data" but many answers are ambiguous on this issue.

So you have a distribution from 0 to 10 for Group A and similarly for Group B. You would like to determine whether this distribution is similar between A and B. If so, a Chi-square test seems appropriate. — user2974951, Jul 29 '21 at 09:40
Hi, thanks for answering! I just briefly googled the Chi-square test and it seems mostly used for comparing the independence rather than the mean difference of two variables. — Neno, Jul 29 '21 at 09:48
To compare the means you could use a t-test or equivalent, however you count variables is not really continuous, so this may not be most correct. — user2974951, Jul 29 '21 at 10:06
True. As answered in this thread (https://stats.stackexchange.com/questions/218548/performing-a-t-test-with-discrete-currency-data), t-test is indeed not a good option for discrete data. — Neno, Jul 29 '21 at 10:32
Have you tried to look up poisson regression? Since you seem to be dealing with counts, you might find an appropriate way to express your data. If the conditions for the poisson regression is not meet you might want to look up the negative binominal distribution. Alternative, you may be able to consider the problem as a mixed logistic regression, with a 0-1 outcome for each round, and a random effect on round. — Kirsten, Jul 29 '21 at 10:48
Hi, Kirsten! Thanks for your insightful comment. I didn't think of these before and will check them! — Neno, Jul 29 '21 at 11:07
If you think that the probability for the behaviour to be exhibited by a given individual should not change across the ten rounds and that the outcomes are independent, the obvious model for the number of occurrencs out of 10 would be binomial. If there are no covariates, and you think the rate at who it occurs should be the same within each group, a straight binomial proportions test would do. If there are covariates, this would suggest a binomial GLM such as logistic regression. If you expect variation in rate of occurrence of the behaviour within group you may want a GLMM — Glen_b, Jul 30 '21 at 05:43

score 1 · Answer 1 · answered Jul 30 '21 at 06:10

The model you would use depends on whether you are willing to assume that the probability of a particular outcome for a group is fixed each round, and if it is not fixed each round, whether there is any auto-regression or other dependency structure in outcomes over the rounds. Since this is a repetitive game, it is possible that players will learn and adapt their play as the rounds go on, so I would suggest starting with a simple model that allows different probabilities over rounds but does not use a complicated dependency structure (at least in the first instance).

If you want to proceed in this way (at least as a starting point), you have a regression model with a binary outcome, so you could use something like logistic regression, with the regression equation:

outcome ~ factor(group) + factor(round)

This method will require you to have a reasonable amount of data for each group in each round, to ensure reasonable estimation of the parameters. You can then examine residual plots and auto-correlation plots for the residuals to see if there is any evidence of a more complicated dependency structure that would require a more complex model.

Hi, Ben. I am trying to make myself through your answers and have 2 follow-up questions. (1) What do you mean by "model" in the first sentence in your reply? Are you suggesting that I should use a mathematical/probabilistic/statistical model to describe the data of each group? And then I compare the model to determine whether there is a difference between the two groups. (2) Now, forget about the context, let's say we have two integer lists here [2,4,5,...] and [9,4,2,...]. I am curious whether there exists a statistical test that examines the mean difference between the two lists? — Neno, Jul 30 '21 at 13:33
You said in your post that participants play a game for 10 rounds, so presumably you have data on the outcome of each round for each participant, and not just the total over the ten games. So if it were me, I would go back to that binary data, where each data point is an ```outcome``` (0 or 1) for a player in group ```group``` in round ```round```. You could then use a logistic regression model on that data to see if ```group``` is statistically related to ```outcome``` (i.e., whether the outcome probabilities are different for the two groups). — Ben, Jul 30 '21 at 22:00

Which statistical test should be used for comparing two discrete/count data?

1 Answers1