Testing if the difference between two count variables is different from zero

Question

I have two count variables for several hundred thousand comparisons, one expected and one observed, and I would like to test if the counts are significantly different.

One possible approach I have looked into is simply subtracting the expected from the observed count for each row and then build a 95% CI and see if 0 is within this range. However, I am unsure if this is the best method and I am wondering if there is something more appropriate for performing such an analysis?

I have also checked into using a GLM for count data to estimate a slope and see if it is equal to 1. However, I have not seen any examples of this being used with count predictor variables, save someone else asking about it here: Does using count data as independent variable violate any of GLM assumptions? From this it appears like it would be okay, if certain things are taken into account. But, does this overcomplicate something as simple "is the difference between observed and expected different from zero?".

Looks difficult; the difference is Skellam distributed but there is no standard GLM for such distributions. http://en.wikipedia.org/wiki/Skellam_distribution You may try t-test which is reasonably robust in large samples. — tomka, Apr 27 '15 at 15:36
@tomka I think there is a misunderstanding stemming from my wording in the last sentence of GLM part of my question. I meant to use the expected count variable as the predictor for the observed counts using an appropriate model for count data (such as zero-inflated poisson), not the difference between the two, which as you mention would be Skellam distributed. Does this change anything? — cifrus dumbledore, Apr 27 '15 at 22:03
Are you in fact saying you have just one observed count & want to compare its distribution to one expected from theory? — Scortchi - Reinstate Monica, Dec 15 '15 at 17:47
@Scortchi I have thousands of counts from two parents and one offspring. We expect the offspring to have the additive value of the parents. This is what I mean for the expectation. We expect that the value of the counts for the offspring should fall on a one-to-one line if we plot the sum of the parents against the offspring and I wanted to see if there was a test for deviation from the line. — cifrus dumbledore, Dec 16 '15 at 22:13

score 3 · Accepted Answer · edited Dec 15 '15 at 17:38

First of all, I suppose that the expected and count variables both follow some discrete distribution, like Poisson, though these distributions do not necessarily need to be known. I suppose you have paired data (dependent samples), because you are mentioning observed counts and their expectations.

Since the distribution of the individual differences are Skellam distributed if the counts are Poisson distributed, a simple t-test for paired samples may be biased. The most straightforward approach is therefore estimating the mean of individual difference scores and bootstrapping the standard error of this difference.

You can check Wikipedia for an introduction to bootstrapping http://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29. A straight forward way of obtaining the bootstrapped standard error and confidence interval is using R.

You sample repeatedly with replacement from your data and estimate the mean difference. The distribution of a large number of repeatedly estimated mean difference score is asymptotically equivalent to the sampling distribution of the difference. Its standard deviation is the standard error and its 2.5 and 97.5 percentiles give you a bootstrapped confidence interval. In your situation the 95%-CI should not enclose zero.

If you do not use R, the CIs can also be obtained in SPSS. You use the menu 'paired sample t-test' and check the 'bootstrap' option. Then SPSS will give you a bootstrap table with CIs for the mean of individual differences.

Testing if the difference between two count variables is different from zero

1 Answers1