Questions tagged [rare-events]

Situations where a level of a categorical variable occurs very rarely (eg, a rare disease). This can be a problem especially when the variable is the response variable in a model.

In statistics, "rare events" refers to situations where a level of a categorical variable occurs very rarely. Such cases can create special problems. The prototypical situation would be predicting a rare disease with a logistic regression model. However, a particular level of a multilevel categorical variable can also be rare, and there can be adverse consequences even when the variable is a predictor as well.

156 questions

votes

9 answers

Taleb and the Black Swan

Taleb's book "The Black Swan" was a New York Times best seller when it came out several years ago. The book is now in its second edition. After meeting with statisticians at a JSM (an annual statistical conference), Taleb toned down his criticism…

extreme-value rare-events

asked Sep 09 '12 at 12:54

Michael R. Chernick

39,640
28
74
143

votes

5 answers

Strategy to deal with rare events logistic regression

I would like to study rare events in a finite population. Since I am unsure about which strategy is best suited, I would appreciate tips and references related to this matter, although I am well-aware it has been largely covered. I just don't really…

logistic rare-events

asked Jul 11 '14 at 08:54

Damien

votes

3 answers

Rare event logistic regression bias: how to simulate the underestimated p's with a minimal example?

CrossValidated has several questions on when and how to apply the rare event bias correction by King and Zeng (2001). I am looking for something different: a minimal simulation-based demonstration that the bias exists. In particular, King and Zeng…

r logistic simulation bias rare-events

asked Aug 28 '15 at 14:32

zkurtz

2,052
16
31

votes

1 answer

Is gradient boosting appropriate for data with low event rates like 1%?

I am trying gradient boosting on a dataset with event rate about 1% using Enterprise miner, but it is failing to produce any output. My question is, since it a decision tree based approach, is it even right to use gradient boosting with such low…

boosting unbalanced-classes rare-events gradient

asked Feb 29 '16 at 14:03

user2542275

votes

2 answers

How do you explain the difference between relative risk and absolute risk?

The other day I had a consultation with an epidemiologist. She is an MD with a public health degree in epidemiology and has a lot of statistical savvy. She mentors her research fellows and residents and helps them with statistical issues. She…

relative-risk absolute-risk rare-events

asked Jun 01 '12 at 09:19

Michael R. Chernick

39,640
28
74
143

votes

1 answer

How to deal with factors with rare levels in cross-validation?

Suppose in a regression analysis in R, I have a factor type independent variable with 3 levels in my train dataset. But in the test data set that same factor variable has 5 levels. Therefore I can not predict the response values for test dataset.…

r regression categorical-data cross-validation rare-events

asked Mar 27 '15 at 16:22

JRK

votes

1 answer

What are the consequences of rare events in logistic regression?

I know that sample size affects power in any statistical method. There are rules are thumb for how many samples a regression needs for each predictor. I also hear often that the number of samples in each category in the dependent variable of a…

logistic assumptions rare-events

asked Oct 12 '17 at 18:00

Michael Webb

1,936
10
21

votes

2 answers

Best use of LSTM for within sequence event prediction

Assume the following 1 dimensional sequence: A, B, C, Z, B, B, #, C, C, C, V, $, W, A, % ... Letters A, B, C, .. here represent 'ordinary' events. Symbols #, $, %, ... here represent 'special' events The temporal spacing between all events is…

time-series deep-learning rare-events lstm sequential-pattern-mining

asked Dec 16 '15 at 15:41

dgorissen

votes

2 answers

How to make the rare events corrections described in King and Zeng (2001)?

I have a dataset with a binary (survival) response variable and 3 explanatory variables (A = 3 levels, B = 3 levels, C = 6 levels). In this dataset, the data is well balanced, with 100 individuals per ABC category. I already studied the effect of…

logistic unbalanced-classes weighted-regression rare-events case-control-study

asked May 15 '14 at 19:08

Aurelie

votes

1 answer

Poisson vs Binomial for rare events

From Poisson's postulates, we know Poisson works for rare events. However, we also know binomial is an approximation of Poisson when the probability of an event is small. So can we use binomial and Poisson interchangeably for rare event? What is the…

binomial-distribution poisson-distribution rare-events

asked Mar 18 '14 at 11:49

user3119750

votes

2 answers

The intuition behind the different scoring rules

Consider the three scoring rules in the case of a binary prediction: Log: sum(log(ifelse(outcome, probability, 1-probability))) / n Brier: sum((outcome-probability)**2) / n Sphere: sum(ifelse(outcome, probability,…

logistic intuition rare-events scoring-rules

asked Apr 23 '15 at 20:03

sds

2,016
1
22
31

votes

1 answer

Rare event logistic regression bias correction

In King and Zheng's paper: http://gking.harvard.edu/files/gking/files/0s.pdf They mention about $\tau$ and $\bar{y}$. I already have data with 90000 0's and 450 1's. I have already fitted a logistic regression with the whole data and want to make a…

regression logistic rare-events bias-correction

asked Jun 03 '14 at 09:07

user1971988

votes

0 answers

Frequency weights, rare events and logistic regression

I'm working on a model that requires me to look for predictors for a rare event (less than 0.5% of the total of my observations). My total sample is a significant part of the total population (50,000 cases). My final objective is to obtain…

r logistic maximum-likelihood weighted-sampling rare-events

asked May 09 '14 at 02:16

Edu

votes

1 answer

During oversampling of rare events, why are the beta coefficients of the independent variables not affected, but only the intercept?

I have followed the King and Zeng paper and understand the consistency of the prior correction after oversampling in logistic regression. But I am trying to understand why the beta coefficients of the independent variables are not affected by the…

logistic multiple-regression rare-events case-control-study oversampling

asked Aug 05 '17 at 05:30

Kingstat

votes

1 answer

Bias Correction for Large Scale Logistic Regression with Rare Events

I have a large dataset constituted of many ad impressions. My dependent binary variable clicked describe whether or not the ad was clicked on. As you can expect, the number of clicks is about 1000x smaller than the number of non-clicks in my…

logistic unbalanced-classes online-algorithms rare-events bias-correction

asked Mar 17 '15 at 18:29

Aymen

2 3

…

10 11 Next