Questions tagged [rare-events]

Situations where a level of a categorical variable occurs very rarely (eg, a rare disease). This can be a problem especially when the variable is the response variable in a model.

In statistics, "rare events" refers to situations where a level of a categorical variable occurs very rarely. Such cases can create special problems. The prototypical situation would be predicting a rare disease with a logistic regression model. However, a particular level of a multilevel categorical variable can also be rare, and there can be adverse consequences even when the variable is a predictor as well.

156 questions
67
votes
9 answers

Taleb and the Black Swan

Taleb's book "The Black Swan" was a New York Times best seller when it came out several years ago. The book is now in its second edition. After meeting with statisticians at a JSM (an annual statistical conference), Taleb toned down his criticism…
Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
33
votes
5 answers

Strategy to deal with rare events logistic regression

I would like to study rare events in a finite population. Since I am unsure about which strategy is best suited, I would appreciate tips and references related to this matter, although I am well-aware it has been largely covered. I just don't really…
Damien
  • 503
  • 2
  • 5
  • 9
19
votes
3 answers

Rare event logistic regression bias: how to simulate the underestimated p's with a minimal example?

CrossValidated has several questions on when and how to apply the rare event bias correction by King and Zeng (2001). I am looking for something different: a minimal simulation-based demonstration that the bias exists. In particular, King and Zeng…
zkurtz
  • 2,052
  • 16
  • 31
18
votes
1 answer

Is gradient boosting appropriate for data with low event rates like 1%?

I am trying gradient boosting on a dataset with event rate about 1% using Enterprise miner, but it is failing to produce any output. My question is, since it a decision tree based approach, is it even right to use gradient boosting with such low…
user2542275
  • 717
  • 2
  • 6
  • 17
12
votes
2 answers

How do you explain the difference between relative risk and absolute risk?

The other day I had a consultation with an epidemiologist. She is an MD with a public health degree in epidemiology and has a lot of statistical savvy. She mentors her research fellows and residents and helps them with statistical issues. She…
Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
10
votes
1 answer

How to deal with factors with rare levels in cross-validation?

Suppose in a regression analysis in R, I have a factor type independent variable with 3 levels in my train dataset. But in the test data set that same factor variable has 5 levels. Therefore I can not predict the response values for test dataset.…
JRK
  • 603
  • 5
  • 15
9
votes
1 answer

What are the consequences of rare events in logistic regression?

I know that sample size affects power in any statistical method. There are rules are thumb for how many samples a regression needs for each predictor. I also hear often that the number of samples in each category in the dependent variable of a…
Michael Webb
  • 1,936
  • 10
  • 21
9
votes
2 answers

Best use of LSTM for within sequence event prediction

Assume the following 1 dimensional sequence: A, B, C, Z, B, B, #, C, C, C, V, $, W, A, % ... Letters A, B, C, .. here represent 'ordinary' events. Symbols #, $, %, ... here represent 'special' events The temporal spacing between all events is…
8
votes
2 answers

How to make the rare events corrections described in King and Zeng (2001)?

I have a dataset with a binary (survival) response variable and 3 explanatory variables (A = 3 levels, B = 3 levels, C = 6 levels). In this dataset, the data is well balanced, with 100 individuals per ABC category. I already studied the effect of…
8
votes
1 answer

Poisson vs Binomial for rare events

From Poisson's postulates, we know Poisson works for rare events. However, we also know binomial is an approximation of Poisson when the probability of an event is small. So can we use binomial and Poisson interchangeably for rare event? What is the…
8
votes
2 answers

The intuition behind the different scoring rules

Consider the three scoring rules in the case of a binary prediction: Log: sum(log(ifelse(outcome, probability, 1-probability))) / n Brier: sum((outcome-probability)**2) / n Sphere: sum(ifelse(outcome, probability,…
sds
  • 2,016
  • 1
  • 22
  • 31
8
votes
1 answer

Rare event logistic regression bias correction

In King and Zheng's paper: http://gking.harvard.edu/files/gking/files/0s.pdf They mention about $\tau$ and $\bar{y}$. I already have data with 90000 0's and 450 1's. I have already fitted a logistic regression with the whole data and want to make a…
user1971988
  • 223
  • 2
  • 5
7
votes
0 answers

Frequency weights, rare events and logistic regression

I'm working on a model that requires me to look for predictors for a rare event (less than 0.5% of the total of my observations). My total sample is a significant part of the total population (50,000 cases). My final objective is to obtain…
Edu
  • 521
  • 5
  • 12
6
votes
1 answer

During oversampling of rare events, why are the beta coefficients of the independent variables not affected, but only the intercept?

I have followed the King and Zeng paper and understand the consistency of the prior correction after oversampling in logistic regression. But I am trying to understand why the beta coefficients of the independent variables are not affected by the…
6
votes
1 answer

Bias Correction for Large Scale Logistic Regression with Rare Events

I have a large dataset constituted of many ad impressions. My dependent binary variable clicked describe whether or not the ad was clicked on. As you can expect, the number of clicks is about 1000x smaller than the number of non-clicks in my…
1
2 3
10 11