Support Vector Machines: a beginner's question about the underlying math

Question

I'm new to Support Vector Machines and I've been trying to get into the underlying math (instead of just using Scikit Learn or something like that).

I understand the math behind it up to the point where we derive this Lagrangian:

$$L = \sum \alpha _{i} - \frac{1}{2}\sum_{i}\sum_{j}\alpha _{i}\alpha _{j} y_{i}y_{j}x_{i}\cdot x_{j}$$

I know how it's derived and the intuition behind it, but I don't know what's next? I've delved into a lot of documents and I've read about how this is solved using Quadratic Programming but I never could find an actual example.

I've also seen some Python code where QP isn't brought up at all but they simply use a Hinge Loss function with a regularizer and then minimize the losses using Stochastic Gradient Descent. Does this mean that there's more than one way to find the optimal Hyperplane when building a SVM?

I've read up on the theory but lost the more practical side of it so to summarize my questions:

1) How do we use that Lagrangian in a practical setting when we actually have some input data and we want to find the optimal Hyperplane? (some practical examples would be very much appreciated)

2) Is it the case that there's more than one way to find the hyperplane that maximizes the margin than just to use Quadratic Programming? (ie. is it possible to use Gradient Descent or other methods).

Thanks a lot.

I have never seen a better explanation that this one: https://www.youtube.com/watch?reload=9&v=_PwhiWxHK8o. Any answer that i could write would just be summary of this video. — Nikolas Rieble, Aug 21 '18 at 19:23
Yes I've seen this video before and it was very helpful but here too they don't elaborate about those questions. But still thank you. — Metrician, Aug 21 '18 at 20:17
You have a mathematical optimization problem. There are many many different algorithms which can solve these optimization problems. Just some examples can be found for general convex optimization: https://en.wikipedia.org/wiki/Convex_optimization#Methods New methods are created for sub-problems as well such e.g. empirical risk minimization where we have an emperical average of the risk plus some weighted norm (e.g. SAGA https://www.di.ens.fr/~fbach/Defazio_NIPS2014.pdf ) — MotiNK, Aug 22 '18 at 18:31
Oh I see. So to maximize the SVM margin you could avoid using Quadratic Programming and opt for other methods? — Metrician, Aug 22 '18 at 20:08
I think this question deserves more attention. @MotiN, if we avoid the QP method, how is it that we can bring in the kernel methods for instance? — Daniel, Jul 22 '20 at 16:33
@Metrician Yes, read the libsvm/liblinear library papers to see other approaches. — Learning stats by example, Aug 26 '20 at 15:10

Support Vector Machines: a beginner's question about the underlying math

0 Answers0