1

I have categorical count data (for categories C1 to C3, but potentially several more categories) for two datasets:

        | --- Dataset 1 --- | --- Dataset 2 --- |
        |  C1    C2    C3   |  C1    C2    C3   |
Item 1  |  0     200   300  |  0     2      3   |
Item 2  |  0     200   300  |  5     0      0   |

The total number of data-points in each dataset is different (500 and 5 in this example).

What statistical test should I use to determine if the distribution of counts for each item across the categories is the same between the two datasets?

For example, the distribution of Item 1 is the same across the two datasets, but the distribution of Item 2 is not. I will test each item separately.

SabreWolfy
  • 1,101
  • 2
  • 15
  • 25

1 Answers1

2

If your reshape your data so that it is datasetXC, then you can use chi square or similar. E.g.

            C1       C2     C3
1            0       200    300
2            0         2     3

Here you would need an exact test.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276