Questions tagged [separation]

Separation occurs when some classes of a categorical outcome can be perfectly distinguished by a linear combination of other variables.

Separation (called by various names: "perfect-s", "complete-s", also "partial-s" or "quasi-s", and strongly related to the Hauck-Donner effect), is when all outcomes with a particular level of a categorical variable are greater (less) than some value C of a linear combination of predictor variables, and all outcomes with the other level are less (greater) than that same value C.

This phenomenon causes the maximum likelihood estimate (MLE) of coefficients in, e.g., logistic regression (and related variants) to diverge. Suppose we are regressing a completely separated dichotomous outcome on a single variable using logistic regression, the maximum likelihood estimate of the coefficient for that variable does not exist. This is because the MLE of that parameter tends towards infinity, and MLEs do not exist for asymptotic results. Separation causes further problems for Wald tests of those parameters.

171 questions

193

votes

10 answers

How to deal with perfect separation in logistic regression?

If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message: Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred We…

r regression logistic separation

asked May 22 '11 at 10:37

user333

6,621
17
44
54

votes

1 answer

Logistic regression in R resulted in perfect separation (Hauck-Donner phenomenon). Now what?

I'm trying to predict a binary outcome using 50 continuous explanatory variables (the range of most of the variables is $-\infty$ to $\infty$). My data set has almost 24,000 rows. When I run glm in R, I get: Warning messages: 1: glm.fit: algorithm…

r regression logistic separation

asked Dec 12 '12 at 23:59

Dcook

votes

2 answers

Logistic regression model does not converge

I've got some data about airline flights (in a data frame called flights) and I would like to see if the flight time has any effect on the probability of a significantly delayed arrival (meaning 10 or more minutes). I figured I'd use logistic…

r logistic separation

asked Dec 10 '10 at 16:28

Daniel Standage

1,109
3
13
21

votes

4 answers

Why does logistic regression become unstable when classes are well-separated?

Why is it that logistic regression becomes unstable when classes are well-separated? What does well-separated classes mean? I would really appreciate if someone can explain with an example.

r regression logistic separation

asked Jan 02 '17 at 08:44

Jane Dow

votes

1 answer

Is there any intuitive explanation of why logistic regression will not work for perfect separation case? And why adding regularization will fix it?

We have many good discussions about perfect separation in logistic regression. Such as, Logistic regression in R resulted in perfect separation (Hauck-Donner phenomenon). Now what? and Logistic regression model does not converge . I personally…

logistic generalized-linear-model optimization intuition separation

asked Oct 13 '16 at 03:30

Haitao Du

32,885
17
118
213

votes

1 answer

Understanding complete separation for logistic regression

Why does logistic regression not converge for a linearly separable data set? For linear separable data sets the model parameters go to infinity when mimizing the error function (according to Bishop2006, Pattern recognition and machine learning,…

logistic separation

asked Jul 21 '16 at 04:17

Matthias

votes

2 answers

What is the probability that $n$ random points in $d$ dimensions are linearly separable?

Given $n$ data points, each with $d$ features, $n/2$ are labeled as $0$, the other $n/2$ are labeled as $1$. Each feature takes a value from $[0,1]$ randomly (uniform distribution). What's the probability that there exists a hyperplane that can…

probability classification mathematical-statistics separation

asked Apr 09 '15 at 22:15

Xing Shi

votes

1 answer

Model selection with Firth logistic regression

In a small data set ($n\sim100$ ) that I am working with, several variables give me perfect prediction/separation. I thus use Firth logistic regression to deal with the issue. If I select the best model by AIC or BIC, should I include the Firth…

logistic model-selection aic separation

asked Mar 01 '14 at 18:24

StasK

29,235
2
80
165

votes

3 answers

Analysis of Danish mask study data by Nassim Nicholas Taleb (binomial GLM with complete separation)

Recently, Nassim Nicholas Taleb made this post about the recent Danish mask study, a randomized controlled trial which concluded that the proportions of newly diagnosed coronavirus infections was not significantly different among the group with…

logistic binomial-distribution contingency-tables fishers-exact-test separation

asked Dec 04 '20 at 09:41

Tom Wenseleers

2,413
1
21
39

votes

1 answer

Seeking a Theoretical Understanding of Firth Logistic Regression

I am trying to understand Firth logistic regression (method of handling perfect/complete or quasi-complete separation in logistic regression) so I can explain it to others in simplified terms. Does anyone have a dummied-down explanation of what…

logistic maximum-likelihood separation

asked Mar 04 '14 at 16:40

ESmith5988

votes

3 answers

Intuition for Support Vector Machines and the hyperplane

In my project I want to create a logistic regression model for predicting binary classification (1 or 0). I have 15 variables, 2 of which are categorical, while the rest are a mixture of continuous and discrete variables. In order to fit a logistic…

machine-learning logistic classification svm separation

asked Mar 29 '17 at 20:00

TheGoat

votes

1 answer

Issue with complete separation in logistic regression (in R)

I am trying to fit a logistic regression model for business defaults. Apart from the dichotomous variable default, the data set includes some performance ratios. When estimating the model in R, the following warning…

r logistic separation

asked Mar 24 '18 at 09:52

Marti

votes

2 answers

Is R's glm function useless in a big data / machine learning setting?

I am surprised that R’s glm will “break” (not converge with default setting) for the following “toy” example (binary classification with ~50k data, ~10 features), but glmnet returns results in seconds. Am I using glm incorrectly (for example, should…

r logistic generalized-linear-model glmnet separation

asked Oct 11 '16 at 04:32

Haitao Du

32,885
17
118
213

votes

1 answer

Binomial glmm with a categorical variable with full successes

I am running a glmm with a binomial response variable and a categorical predictor. The random effect is given by the nested design used for the data collection. The data looks like this: m.gen1$treatment [1] sucrose control protein …

r generalized-linear-model lme4-nlme separation

asked Jan 07 '15 at 13:29

AtiQP

votes

1 answer

Enormous coefficients in logistic regression - what does it mean and what to do?

I get enormous coefficients during logistic regression, see coefficients with krajULKV: > summary(m5) Call: glm(formula = cbind(ml, ad) ~ rok + obdobi + kraj + resid_usili2 + rok:obdobi + rok:kraj + obdobi:kraj + kraj:resid_usili2 + …

regression logistic generalized-linear-model separation

asked Jan 29 '13 at 00:58

Tomas

5,735
11
52
93

2 3

…

11 12 Next