61

Can someone please give me some intuition as to when to choose either SVM or LR? I want to understand the intuition behind what is the difference between the optimization criteria of learning the hyperplane of the two, where the respective aims are as follows:

  • SVM: Try to maximize the margin between the closest support vectors
  • LR: Maximize the posterior class probability

Let's consider the linear feature space for both SVM and LR.

Some differences I know of already:

  1. SVM is deterministic (but we can use Platts model for probability score) while LR is probabilistic.
  2. For the kernel space, SVM is faster (stores just support vectors)
Sycorax
  • 76,417
  • 20
  • 189
  • 313
user41799
  • 661
  • 1
  • 6
  • 5
  • 6
    This statement is wrong: "_LR: Maximize the posterior class probability_". Logistic regression maximises the likelihood, not some posterior density. _Bayesian logistic regression_ is a different story, but you need to be specific about it if that's what you're referring to. – Digio Feb 14 '18 at 08:25

4 Answers4

44

Linear SVMs and logistic regression generally perform comparably in practice. Use SVM with a nonlinear kernel if you have reason to believe your data won't be linearly separable (or you need to be more robust to outliers than LR will normally tolerate). Otherwise, just try logistic regression first and see how you do with that simpler model. If logistic regression fails you, try an SVM with a non-linear kernel like a RBF.

EDIT:

Ok, let's talk about where the objective functions come from.

The logistic regression comes from generalized linear regression. A good discussion of the logistic regression objective function in this context can be found here: https://stats.stackexchange.com/a/29326/8451

The Support Vector Machines algorithm is much more geometrically motivated. Instead of assuming a probabilistic model, we're trying to find a particular optimal separating hyperplane, where we define "optimality" in the context of the support vectors. We don't have anything resembling the statistical model we use in logistic regression here, even though the linear case will give us similar results: really this just means that logistic regression does a pretty good job of producing "wide margin" classifiers, since that's all SVM is trying to do (specifically, SVM is trying to "maximize" the margin between the classes).

I'll try to come back to this later and get a bit deeper into the weeds, I'm just sort of in the middle of something :p

David Marx
  • 6,647
  • 1
  • 25
  • 43
  • 1
    But that still doesn't answer my question on what's the intuitive difference in the objective functions of SVM v/s LR, which are as follows: (a) SVM: Try to maximize the margin between the closest support vectors (b) LR: Maximize the posterior class probability – user41799 Apr 27 '14 at 02:46
  • I mean, that's a completely different question. Are you asking about when to use the models, or what motivates the form of their objective functions? – David Marx Apr 27 '14 at 03:01
  • 1
    I am more interested in what motivates the form of their objective functions – user41799 Apr 27 '14 at 03:14
  • 15
    _I'll try to come back to this later and get a bit deeper into the weeds, I'm just sort of in the middle of something_ Four years later... – user1717828 Apr 08 '18 at 17:19
29

Logistic Regression Vs SVM

Image signifies the difference between SVM and Logistic Regression and where to use which method

this picture comes from the coursera course : "machine learning" by Andrew NG. It can be found in week 7 at the end of: "Support vector machines - using an SVM"

JSONParser
  • 423
  • 5
  • 5
9
  • LR gives calibrated probabilities that can be interpreted as confidence in a decision.
  • LR gives us an unconstrained, smooth objective.
  • LR can be (straightforwardly) used within Bayesian models.
  • SVMs don’t penalize examples for which the correct decision is made with sufficient confidence. This may be good for generalization.
  • SVMs have a nice dual form, giving sparse solutions when using the kernel trick (better scalability)

Check out Support Vector Machines vs Logistic Regression, University of Toronto CSC2515 by Kevin Swersky.

Arya McCarthy
  • 6,390
  • 1
  • 16
  • 47
Chankey Pathak
  • 191
  • 1
  • 4
0

I think another advantage of LR is that it's actually optimising the weights of an interpretable function (e.g. Y = B0 + B1X1 +B2X2, where X1 and X2 are your predictor variables/features). This means that you could use the model with pen, paper and a basic scientific calculator and get a probability output if you wanted to.

All you have to do is calculate Y with the above optimised function, and plug Y into the sigmoid function to get a class probability between 0 and 1.

This might be useful in some fields/applications, although less and less as we move forward and can just plug numbers into an app and get a result from the model.