5

I need to compare means of two data sets that binary. For example:

a = [1,1,0,0,0,0,0,0,0,1]
b = [1,0,1,1,1,0,0,1,1,0]

All I need to know is whether the means are statistically significantly different between the two datasets, in other words the order in which 1 are arranged does not matter. And I do know that all values are either 0 or 1. Also in my case sizes of and be are fairly large, greater than 10,000 and number of 1 is about 10 to 100.

What is the best test to use in this case?

I know that I cannot use t-test because my data is not normally distributed.

dimitriy
  • 31,081
  • 5
  • 63
  • 138
Akavall
  • 2,429
  • 2
  • 20
  • 27
  • Means of 0-1 variables are the same thing as "the proportion of 1's" (both in sample and population senses). That is, you're in the situation of comparing proportions. *If* the variables satisfy the conditions of bernoulli trials (independence, homogeneity of probability of 1's), then a two-sample proportions test (or a chi-square, or several other tests suitable for such tables) would be the obvious analyses. – Glen_b Feb 12 '14 at 22:10

2 Answers2

7

You can express your data in the form of a contingency table. For a small N you can use Fisher's exact test to test whether your measurements a and b are dependent on each other.

For a larger N you can use the chi-squared test

dylan2106
  • 444
  • 2
  • 13
  • Thank You. I think Fisher exact test is what I was looking for. – Akavall Feb 12 '14 at 19:25
  • Is there a similar (non-parametric) test of binary datasets where each dataset has a different number of samples? – A. Bollans Mar 02 '22 at 10:20
  • I've answered my question here: https://stats.stackexchange.com/questions/25299/comparing-two-binary-variables-of-unequal-sizes Though, I can't find a more general test for discrete variables – A. Bollans Mar 02 '22 at 10:38
0

Since in your case means and proportions are the same you can use proportions test for testing the null that the proportions (probabilities of success) in groups are the same.

As for reference I can suggest the
Hollander and Wolfe's book Nonparametric Statistical Methods

abbat_VL
  • 49
  • 3