2

I'm looking for a rare events model where the dependent variable is a discrete index, which means I cannot use the rare events logit model (Gary King). My dependent variable is an index of integers that range from 0 to 15, but the median is 0 and the mean is about 0.25, which implies that getting a value of 1 or more is pretty rare for this dataset.

Any ideas on a model that would be better than simply running an OLS on the indexed dependent variable? And if you could recommend an R package that can get this job done, then all the better!

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Captain Murphy
  • 291
  • 3
  • 7
  • 1
    Is your dependent variable continuous or discrete? You refer to it as continuous twice, but then describe it as an index of integers that range from 0 to 15, which would be discrete. – jbowman Jun 11 '12 at 20:19
  • Sorry about that -- it's a discrete index. I was trying to make it clear that it is not binary data, like the rare events logit model. – Captain Murphy Jun 11 '12 at 20:29
  • 1
    Multinomial logit/probit seems like it would work. I've done it in stata, but haven't tried it in R. – John Jun 11 '12 at 20:38
  • 1
    Procedures like logit and probit underestimate the probability of rare events, which is why I'm looking for a model that is designed for such rare occurrences in the data. Essentially, I'm looking for a model like the rare events logit that does not require the dependent variable to be binary. – Captain Murphy Jun 11 '12 at 22:46
  • Is your index really just indexing an event, or is it a count of some sort? (Does 4=2*2?) – jbowman Jun 12 '12 at 01:31

2 Answers2

2

No matter what model you use very rare events are a problem because you may never see them in your data or if you do you will not see many unless you look for events over a very long time period. I don't think it is ever a good idea to just conjure up a model that will give some answer that may be accurate when the model is correct but could be sensitive to departures from the model. For very rare events the model assumptions may be difficult or impossible to check. It would help if there is a physical basis for the choice of the form of the point process. Maybe a Poisson model would be appropriate or maybe you have apriori reason to believe there is overdispersion in which case a negative binomial might be more appropriate.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • 1
    I ended up going with a negative binomial, which I was leaning toward at the beginning. I appreciate your post, and I'll give you the check since I ended up using a NB. – Captain Murphy Jun 13 '12 at 16:44
0

You could maybe try ordinal logistic regression, treatment of that along the lines of Gary Kings rare event logistic regression should be possible!

Some other similar posts with relevant answers is: Rare event ordinal logit with discrete response $Y_t \in \{1,2,3\}$ Strategy to deal with rare events logistic regression

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467