Comparing SVM and logistic regression

Question

Can someone please give me some intuition as to when to choose either SVM or LR? I want to understand the intuition behind what is the difference between the optimization criteria of learning the hyperplane of the two, where the respective aims are as follows:

SVM: Try to maximize the margin between the closest support vectors
LR: Maximize the posterior class probability

Let's consider the linear feature space for both SVM and LR.

Some differences I know of already:

SVM is deterministic (but we can use Platts model for probability score) while LR is probabilistic.
For the kernel space, SVM is faster (stores just support vectors)

This statement is wrong: "_LR: Maximize the posterior class probability_". Logistic regression maximises the likelihood, not some posterior density. _Bayesian logistic regression_ is a different story, but you need to be specific about it if that's what you're referring to. — Digio, Feb 14 '18 at 08:25

score 44 · Answer 1 · edited Apr 13 '17 at 12:44

Linear SVMs and logistic regression generally perform comparably in practice. Use SVM with a nonlinear kernel if you have reason to believe your data won't be linearly separable (or you need to be more robust to outliers than LR will normally tolerate). Otherwise, just try logistic regression first and see how you do with that simpler model. If logistic regression fails you, try an SVM with a non-linear kernel like a RBF.

EDIT:

Ok, let's talk about where the objective functions come from.

The logistic regression comes from generalized linear regression. A good discussion of the logistic regression objective function in this context can be found here: https://stats.stackexchange.com/a/29326/8451

The Support Vector Machines algorithm is much more geometrically motivated. Instead of assuming a probabilistic model, we're trying to find a particular optimal separating hyperplane, where we define "optimality" in the context of the support vectors. We don't have anything resembling the statistical model we use in logistic regression here, even though the linear case will give us similar results: really this just means that logistic regression does a pretty good job of producing "wide margin" classifiers, since that's all SVM is trying to do (specifically, SVM is trying to "maximize" the margin between the classes).

I'll try to come back to this later and get a bit deeper into the weeds, I'm just sort of in the middle of something :p

But that still doesn't answer my question on what's the intuitive difference in the objective functions of SVM v/s LR, which are as follows: (a) SVM: Try to maximize the margin between the closest support vectors (b) LR: Maximize the posterior class probability — user41799, Apr 27 '14 at 02:46
I mean, that's a completely different question. Are you asking about when to use the models, or what motivates the form of their objective functions? — David Marx, Apr 27 '14 at 03:01
I am more interested in what motivates the form of their objective functions — user41799, Apr 27 '14 at 03:14
_I'll try to come back to this later and get a bit deeper into the weeds, I'm just sort of in the middle of something_ Four years later... — user1717828, Apr 08 '18 at 17:19

score 29 · Answer 2 · edited Jun 11 '20 at 14:32

29

Image signifies the difference between SVM and Logistic Regression and where to use which method

this picture comes from the coursera course : "machine learning" by Andrew NG. It can be found in week 7 at the end of: "Support vector machines - using an SVM"

edited Jun 11 '20 at 14:32

Community

1

answered Mar 17 '16 at 13:18

JSONParser

423
5
5

By "features", do you mean the number of unique attributes or the total number of unique values belonging to those attributes? – Ahmedov Jun 19 '17 at 10:29
eg : in price price prediction of rubber, one feature is petrol price one is weather etc ..... – JSONParser Jun 19 '17 at 11:18
8

Actually, the image does not say anything about their differences... – Jan Kukacka Oct 16 '18 at 08:56
difference may be wrong word comparison can be better – JSONParser Oct 16 '18 at 09:02

score 9 · Answer 3 · edited Jun 04 '21 at 06:55

9

LR gives calibrated probabilities that can be interpreted as confidence in a decision.
LR gives us an unconstrained, smooth objective.
LR can be (straightforwardly) used within Bayesian models.
SVMs don’t penalize examples for which the correct decision is made with sufficient confidence. This may be good for generalization.
SVMs have a nice dual form, giving sparse solutions when using the kernel trick (better scalability)

Check out Support Vector Machines vs Logistic Regression, University of Toronto CSC2515 by Kevin Swersky.

edited Jun 04 '21 at 06:55

Arya McCarthy

6,390
1
16
47

answered Oct 23 '18 at 03:27

Chankey Pathak

191
1
4

1

Excellent reference, thank you. – lpounng Aug 21 '20 at 04:08

score 0 · Answer 4 · answered May 28 '20 at 09:25

I think another advantage of LR is that it's actually optimising the weights of an interpretable function (e.g. Y = B0 + B1X1 +B2X2, where X1 and X2 are your predictor variables/features). This means that you could use the model with pen, paper and a basic scientific calculator and get a probability output if you wanted to.

All you have to do is calculate Y with the above optimised function, and plug Y into the sigmoid function to get a class probability between 0 and 1.

This might be useful in some fields/applications, although less and less as we move forward and can just plug numbers into an app and get a result from the model.

Comparing SVM and logistic regression

4 Answers4

Linked

Related