I'm new to Support Vector Machines and I've been trying to get into the underlying math (instead of just using Scikit Learn or something like that).
I understand the math behind it up to the point where we derive this Lagrangian:
$$L = \sum \alpha _{i} - \frac{1}{2}\sum_{i}\sum_{j}\alpha _{i}\alpha _{j} y_{i}y_{j}x_{i}\cdot x_{j}$$
I know how it's derived and the intuition behind it, but I don't know what's next? I've delved into a lot of documents and I've read about how this is solved using Quadratic Programming but I never could find an actual example.
I've also seen some Python code where QP isn't brought up at all but they simply use a Hinge Loss function with a regularizer and then minimize the losses using Stochastic Gradient Descent. Does this mean that there's more than one way to find the optimal Hyperplane when building a SVM?
I've read up on the theory but lost the more practical side of it so to summarize my questions:
1) How do we use that Lagrangian in a practical setting when we actually have some input data and we want to find the optimal Hyperplane? (some practical examples would be very much appreciated)
2) Is it the case that there's more than one way to find the hyperplane that maximizes the margin than just to use Quadratic Programming? (ie. is it possible to use Gradient Descent or other methods).
Thanks a lot.