Why aren't there there two regularization terms in SVC?

Question

I'm looking at the formulation for SVC as stated on sklearn's website (http://scikit-learn.org/stable/modules/svm.html#svc). The loss function here minimizes a "flatness term" and a (regularized) "error" term (I'm not sure if this is the terminology usually used but this is just how I think about it).

However, it seems to have the ability to do both l1 and l2 regularization (http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html). As I'm used to it (e.g. in the context of logistic regression), this is usually an error term on the coefficients. In this case, I'd assume its on the vector w. That being my assumption, I have two questions:

(1) why isn't there a term for this in the loss function as described (maybe its just a documentation issue)?

(2) why isn't there a second regularization coefficient to specify?

I'm assuming they didn't screw up, so my real question is: how does regularization work in the SVM/SVC context i.e. what's the loss function?

So, I have a guess - I think that if l1 regularization is used, the "flatness" term is made non-quadratic (i.e. the square root of the norm of the vector). If this is correct - there is no need for a second regularization term. If this is right, please let me know :) — roundsquare, Aug 24 '18 at 02:21

score 3 · Answer 1 · answered Aug 24 '18 at 08:58

The documentation describes non-linear SVC using Kernels. The $\ell_1$ penalty is only available for linear SVC (does not use Kernels). Linear SVC is similar to penalised logistic regression, but uses the hinge loss: $$ \arg\min_{\mathbf{w}} \frac{1}{2} \lVert \mathbf{w} \rVert_2^2 + C \sum_{i=1}^n \max(0, 1 - y_i \mathbf{w}^\top \mathbf{x}_i) , $$ where the first term is an $\ell_2$ penalty and the max part is the hinge loss. Sometimes you also see the squared hinge loss, which has the advantage of being differentiable. As in logistic regression, you can use an $\ell_1$ penalty to obtain sparse coefficients. In either case, the regularisation and the loss function are convex and standard solvers for convex problems can be employed, albeit more advanced and efficient solvers are available.

Why aren't there there two regularization terms in SVC?

1 Answers1

Linked