5

Suppose I want to test whether or not code patches created on weekends have a greater bug rate than those created on weekdays. (We might guess that this is so because people who are at work on weekends are more hurried etc.)

If we follow the standard and model bug creation as a Poisson process, this means (I think) that we want to tell the probability that both data sets came from the same underlying distribution, i.e. $\lambda_1 = \lambda_2$.

How can I do this? One way I thought of is use MLE to find the parameter which best fits one distribution, and then test the likelihood that parameter generates the second distribution. Alternatively, I could do two regressions and then use a likelihood ratio test. The problem with both methods is they are not testing whether one single $\lambda$ underlies both sets, but rather if the best-fit for one is the best-fit for the other.

Xodarap
  • 2,448
  • 2
  • 17
  • 24
  • 4
    Why run two regressions when you can run one with a dummy variable for weekends and just test its significance? – whuber Mar 02 '13 at 23:17
  • @whuber: To be honest, I was never sure what it meant to say that a variable was "significant". You're saying that "`weekends` is significant at p = a" means "the likelihood that both distributions are drawn from a distribution unaffected by `weekends` is a"? – Xodarap Mar 02 '13 at 23:23
  • 3
    A "significant" dummy for a binary independent variable means there is evidence that the mean values of the response vary with the level of the binary variable. That sounds exactly like the hypothesis you wish to test. A simple adjustment will turn the default two-sided test into a one-sided test if you like. – whuber Mar 02 '13 at 23:26
  • 1
    (BTW, the purpose in suggesting a regression is that it makes it easy to adjust for any covariates. In this application there ought to be some: certainly time; and probably type of code, programmer identifier, and other relevant attributes as well.) – whuber Mar 03 '13 at 18:30

1 Answers1

3

For constant $\lambda$ under the null, if you condition on the sum, it is simply a binomial test for a pre-specified proportion (the proportions being related to the relative proportion of exposure).

Glen_b
  • 257,508
  • 32
  • 553
  • 939