How can I test the likelihood that two Poisson data sets are drawn from the same distribution in R?

Question

Suppose I want to test whether or not code patches created on weekends have a greater bug rate than those created on weekdays. (We might guess that this is so because people who are at work on weekends are more hurried etc.)

If we follow the standard and model bug creation as a Poisson process, this means (I think) that we want to tell the probability that both data sets came from the same underlying distribution, i.e. $\lambda_1 = \lambda_2$.

How can I do this? One way I thought of is use MLE to find the parameter which best fits one distribution, and then test the likelihood that parameter generates the second distribution. Alternatively, I could do two regressions and then use a likelihood ratio test. The problem with both methods is they are not testing whether one single $\lambda$ underlies both sets, but rather if the best-fit for one is the best-fit for the other.

Why run two regressions when you can run one with a dummy variable for weekends and just test its significance? — whuber, Mar 02 '13 at 23:17
@whuber: To be honest, I was never sure what it meant to say that a variable was "significant". You're saying that "`weekends` is significant at p = a" means "the likelihood that both distributions are drawn from a distribution unaffected by `weekends` is a"? — Xodarap, Mar 02 '13 at 23:23
A "significant" dummy for a binary independent variable means there is evidence that the mean values of the response vary with the level of the binary variable. That sounds exactly like the hypothesis you wish to test. A simple adjustment will turn the default two-sided test into a one-sided test if you like. — whuber, Mar 02 '13 at 23:26
(BTW, the purpose in suggesting a regression is that it makes it easy to adjust for any covariates. In this application there ought to be some: certainly time; and probably type of code, programmer identifier, and other relevant attributes as well.) — whuber, Mar 03 '13 at 18:30

Glen_b · Accepted Answer · 2013-03-03T02:45:22.087

3

For constant $\lambda$ under the null, if you condition on the sum, it is simply a binomial test for a pre-specified proportion (the proportions being related to the relative proportion of exposure).

edited Mar 03 '13 at 02:45

answered Mar 03 '13 at 02:39

Glen_b

257,508
32
553
939

2

This test is done using the `poisson.test()` function in R. – caracal Mar 03 '13 at 09:52

How can I test the likelihood that two Poisson data sets are drawn from the same distribution in R?

1 Answers1

Linked