Questions tagged [logistic]

Refers generally to statistical procedures that utilize the logistic function, most commonly various forms of logistic regression

The logistic function is $$ f(x) = \frac{1}{1+e^{-x}}, $$ which maps real numbers to $(0,1)$. One common use of the logistic function is logistic regression, which is a standard method of quantifying the effect of a set of predictors $\{X_1, ..., X_p\}$ on a binary outcome, $Y$. The model can be written as

$$ P(Y=1|X) = f(\beta_0 + \beta_1X_1 + ... + \beta_p X_p) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + ... + \beta_p X_p)}}$$

The logistic regression model has the nice property that the exponentiated regression coefficients can be interpreted as odds ratios associated with a one unit increase in the predictor.

Often we consider the odds in favor of $Y=1$ given $X$:

$$\text{odds} = \frac{P(Y=1|X)}{P(Y=0|X)} = \frac{P(Y=1|X)}{1 - P(Y=1|X)} = e^{\beta_0 + \beta_1X_1 + ... + \beta_p X_p}$$

The odds ratio associated with a one unit increase in some predictor, $X_i$, is therefore written as:

$$\frac{\text{odds}(x_i+1)}{\text{odds}(x_i)} = \frac{e^{\beta_0 + \beta_1X_1 + ...+ \beta_i(X_i+1) + ... + \beta_p X_p}}{e^{\beta_0 + \beta_1X_1 + ...+ \beta_iX_i + ... + \beta_p X_p}} = e^{\beta_i}$$

A second use of the logistic function (but unrelated to logistic regression) is the logistic distribution, which has $f(x)$ as its quantile function.

7015 questions
354
votes
12 answers

Difference between logit and probit models

What is the difference between Logit and Probit model? I'm more interested here in knowing when to use logistic regression, and when to use Probit. If there is any literature which defines it using R, that would be helpful as well.
Beta
  • 5,784
  • 9
  • 33
  • 44
193
votes
10 answers

How to deal with perfect separation in logistic regression?

If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message: Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred We…
user333
  • 6,621
  • 17
  • 44
  • 54
136
votes
3 answers

What is the difference between linear regression and logistic regression?

What is the difference between linear regression and logistic regression? When would you use each?
B Seven
  • 2,873
  • 4
  • 24
  • 29
109
votes
4 answers

Softmax vs Sigmoid function in Logistic classifier?

What decides the choice of function ( Softmax vs Sigmoid ) in a Logistic classifier ? Suppose there are 4 output classes . Each of the above function gives the probabilities of each class being the correct output . So which one to take for a…
mach
  • 1,545
  • 3
  • 10
  • 12
107
votes
3 answers

Does an unbalanced sample matter when doing logistic regression?

Okay, so I think I have a decent enough sample, taking into account the 20:1 rule of thumb: a fairly large sample (N=374) for a total of 7 candidate predictor variables. My problem is the following: whatever set of predictor variables I use, the…
Michiel
  • 1,173
  • 3
  • 8
  • 5
107
votes
4 answers

What is rank deficiency, and how to deal with it?

Fitting a logistic regression using lme4 ends with Error in mer_finalize(ans) : Downdated X'X is not positive definite. A likely cause of this error is apparently rank deficiency. What is rank deficiency, and how should I address it?
Jack Tanner
  • 4,552
  • 3
  • 27
  • 39
102
votes
4 answers

Why isn't Logistic Regression called Logistic Classification?

Since Logistic Regression is a statistical classification model dealing with categorical dependent variables, why isn't it called Logistic Classification? Shouldn't the "Regression" name be reserved to models dealing with continuous dependent…
96
votes
2 answers

Solving for regression parameters in closed-form vs gradient descent

In Andrew Ng's machine learning course, he introduces linear regression and logistic regression, and shows how to fit the model parameters using gradient descent and Newton's method. I know gradient descent can be useful in some applications of…
Jeff
  • 3,525
  • 5
  • 27
  • 38
95
votes
5 answers

How to calculate Area Under the Curve (AUC), or the c-statistic, by hand

I am interested in calculating area under the curve (AUC), or the c-statistic, by hand for a binary logistic regression model. For example, in the validation dataset, I have the true value for the dependent variable, retention (1 = retained; 0 = not…
Matt Reichenbach
  • 3,404
  • 6
  • 25
  • 43
87
votes
4 answers

What is the difference between a "link function" and a "canonical link function" for GLM

What's the difference between terms 'link function' and 'canonical link function'? Also, are there any (theoretical) advantages of using one over the other? For example, a binary response variable can be modeled using many link functions such as…
steadyfish
  • 1,772
  • 2
  • 15
  • 30
86
votes
5 answers

What do the residuals in a logistic regression mean?

In answering this question John Christie suggested that the fit of logistic regression models should be assessed by evaluating the residuals. I'm familiar with how to interpret residuals in OLS, they are in the same scale as the DV and very clearly…
russellpierce
  • 17,079
  • 16
  • 67
  • 98
78
votes
3 answers

Diagnostics for logistic regression?

For linear regression, we can check the diagnostic plots (residuals plots, Normal QQ plots, etc) to check if the assumptions of linear regression are violated. For logistic regression, I am having trouble finding resources that explain how to…
ialm
  • 1,707
  • 2
  • 19
  • 19
78
votes
1 answer

How does a simple logistic regression model achieve a 92% classification accuracy on MNIST?

Even though all the images in the MNIST dataset are centered, with a similar scale, and face up with no rotations, they have a significant handwriting variation that puzzles me how a linear model achieves such a high classification accuracy. As far…
Nitish Agarwal
  • 813
  • 4
  • 6
68
votes
8 answers

Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)?

I have SPSS output for a logistic regression model. The output reports two measures for the model fit, Cox & Snell and Nagelkerke. So as a rule of thumb, which of these $R^²$ measures would you report as the model fit? Or, which of these fit indices…
Henrik
  • 13,314
  • 9
  • 63
  • 123
67
votes
3 answers

Is standardization needed before fitting logistic regression?

My question is do we need to standardize the data set to make sure all variables have the same scale, between [0,1], before fitting logistic regression. The formula is: $$\frac{x_i-\min(x_i)}{\max(x_i)-\min(x_i)}$$ My data set has 2 variables,…
user1946504
  • 1,247
  • 3
  • 14
  • 17
1
2 3
99 100