102

Since Logistic Regression is a statistical classification model dealing with categorical dependent variables, why isn't it called Logistic Classification? Shouldn't the "Regression" name be reserved to models dealing with continuous dependent variables?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Ismael Ghalimi
  • 1,968
  • 2
  • 12
  • 21
  • 6
    Logistic regression belongs to the GLM family of models. – Stéphane Laurent Dec 07 '14 at 19:04
  • 11
    You can use it to regress probabilities. – Emre Dec 07 '14 at 22:38
  • 28
    While logistic regression can certainly be used for classification by introducing a threshold on the probabilities it returns, that's hardly its only use - or even its primary use. It was developed for - and continues to be used for - regression purposes that have nothing to do with classification. I'd argue that this is still easily what it's mostly used for, but I suppose it depends on what you look at. – Glen_b Dec 08 '14 at 01:07
  • 6
    You might find [this paper](http://papers.tinbergen.nl/02119.pdf) on the development of logistic regression interesting, particularly since it does give some sense of the kinds of problems that it is used for as a regression technique. – Glen_b Dec 08 '14 at 01:14

4 Answers4

128

Logistic regression is emphatically not a classification algorithm on its own. It is only a classification algorithm in combination with a decision rule that makes dichotomous the predicted probabilities of the outcome. Logistic regression is a regression model because it estimates the probability of class membership as a (transformation of a) multilinear function of the features.

Frank Harrell has posted a number of answers on this website enumerating the pitfalls of regarding logistic regression as a classification algorithm. Among them:

If I recall correctly, he once pointed me to his book on regression strategies for more elaboration on these (and more!) points, but I can't seem to find that particular post.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • 3
    If that's the case, all(or most) the classifiers predicts the probabilities to belong in a class first(as far as I know) and then transform this prob to classes.. Don't they? – Outlier Dec 10 '14 at 06:51
  • 9
    @Outlier Counterexample: SVM doesn't compute class probabilities at all, it just measures the distance between an observation and a hyperplane. – Sycorax Dec 10 '14 at 12:59
  • @Outlier in ML these are called probabilistic classifiers; trees and random forest are not, xgboost is - at least with logloss) – seanv507 May 25 '19 at 06:46
  • @SycoraxsaysReinstateMonica So is there **any** classification algorithm in the world? SVMs compute distance from the class boundary, Neural Networks compute some other continuous function... – Igor F. Dec 19 '19 at 11:56
  • 1
    @IgorF. Any algorithm with a decision rule discretizing the output as classes is a classifier. Logistic regression is properly named because it's a regression of class probabilities. – Sycorax Dec 19 '19 at 13:09
  • @SycoraxsaysReinstateMonica Let me rephrase the question: Can you name me classification algorithms which, in the training phase, use discretized outputs for adjusting the parameters? For all others, which use continuous values (e.g. SVM or neural networks), I don't see a justification for calling them "classification" algorithms any more than I see it for logistic regression. – Igor F. Dec 19 '19 at 14:09
  • @IgorF. A decision tree adjusts its parameters (chooses feature splits) on the basis of discretized outputs (class labels) by choosing the largest information gain. – Sycorax Dec 19 '19 at 14:11
  • @SycoraxsaysReinstateMonica OK. And, I guess, a (k)-nearest-neighbor(s) algorithm can also be called "classifier", right? But you agree that SVM and neural networks are actually regression algorithms(?). – Igor F. Dec 19 '19 at 14:55
  • @IgorF. KNN doesn’t seem to answer your question because it doesn’t have parameters optimized during training. If we read your question as “are there any models that maximize the number of correct classifications directly?” then I think nothing qualifies because number of correct classifications is not differentiable and is a combinatorial problem: each datum could belong to any class, and doing an exhaustive search is expensive. Not sure about SVM and NNs. I’d be surprised if all algorithms had to be exactly one of “classifier” and “regressor.” – Sycorax Dec 19 '19 at 15:37
15

Abstractly, regression is the problem of calculating a conditional expectation $E[Y|X=x]$. The form taken by this expectation is different depending on the assumptions of how the data were generated:

  • Assuming (Y|X=x) to be normally distributed yields with classical linear regression.
  • Assuming a Poisson distribution yields Poisson regression.
  • Assuming a Bernoulli distribution yields logistic regression.

The term "regression" has also been used more generally than this, including approaches like quantile regression, which estimates a given quantile of $(Y|X=x)$.

Chad Scherrer
  • 271
  • 1
  • 2
2

Blockquote The U.S. Weather Service has always phrased rain forecasts as probabilities. I do not want a classification of “it will rain today.” There is a slight loss/disutility of carrying an umbrella, and I want to be the one to make the tradeoff. Blockquote

Dr. Frank Harrell, https://www.fharrell.com/post/classification/

Classification is when you make a concrete determination of what category something is a part of. Binary classification involves two categories, and by the law of the excluded middle, that means binary classification is for determining whether something “is” or “is not” part of a single category. There either are children playing in the park today (1), or there are not (0).

Although the variable you are targeting in logistic regression is a classification, logistic regression does not actually individually classify things for you: it just gives you probabilities (or log odds ratios in the logit form). The only way logistic regression can actually classify stuff is if you apply a rule to the probability output. For example, you may round probabilities greater than or equal to 50% to 1, and probabilities less than 50% to 0, and that’s your classification.

if you want to read more please check this link for more detail https://ryxcommar.com/2020/06/27/why-do-so-many-practicing-data-scientists-not-understand-logistic-regression/

Devious
  • 21
  • 1
-1

Apart from already provided good answers, another view is that Logistic regression predicts probabilities (which is continuous value) that have got range from 0 to 1.

enter image description here

krish___na
  • 171
  • 4
  • what happens if u insert the outcome y of your linear model into another linear model (can think of it as an extreme case of sigmoid with very gentle change of gradient. Will that work for classification? If not why? – Zzy1130 Feb 05 '20 at 08:04