5

I understand that SVM is about solving the constrained optimization such that

$$\min_{\mathbf{w}} \dfrac{1}{2} \mathbf{w}^T\mathbf{w}$$ subject to $$y_i(\mathbf{w}^T\mathbf{x_i}+b)\geq{1}, i=1, 2, ...,n$$


And this is handled using nonlinear optimization method Karush–Kuhn–Tucker approach where one step is based on the necessary complementary slackness condition such that

$${\alpha}_i\left(y_i(\mathbf{w}^T\mathbf{x_i}+b)-1\right)=0, i=1, 2, ...,n$$ has to be satified.


Because for non-gutter dots (i.e., the points not on the edge of the separating hyperplane), we have

$$y_i(\mathbf{w}^T\mathbf{x_i}+b)-1 > 0$$

the corresponding multiplier $\alpha_i$ then must be $0$. But my question is for gutter points, because

$$y_i(\mathbf{w}^T\mathbf{x_i}+b)-1 = 0$$

we know that the corresponding multiplier $\alpha_i$ should be non-negative, but are they necessarily positive? In other words, if I define support vector as any $\mathbf{x_i}$ on the gutter, then is this the same as if I define support vector as any $\mathbf{x_i}$ whose multiplier is positive?

Nicholas
  • 495
  • 5
  • 11
  • 2
    I think I know what you mean, but I don't think "gutter point" is standard terminology? (When I google for "gutter point" and "svm" I only get this page.) – Matthew Gunn Nov 05 '16 at 22:02

1 Answers1

1

It's mathematically possible to have cases where the constraint $y_i(\mathbf{w}^T\mathbf{x_i}+b)\geq{1}$ is satisfied with equality and where Lagrange multiplier $a_i = 0$ if the constraint isn't active in the sense that without the constraint, you would still have the same solution $\mathbf{w}$. In practice, this is probably a knife-edge case.

Intuitively, multipliers are penalties for violating a constraint

The intuitive interpretation of Lagrangian multipliers is that they are the price of violating a constraint.

Imagine that we have the problem:

\begin{equation} \begin{array}{*2{>{\displaystyle}r}} \mbox{minimize (over $\mathbf{x}$)} & f(\mathbf{x}) \\ \mbox{subject to} & g(\mathbf{x}) \leq 0 \end{array} \end{equation}

The Lagrangian is:

$$ \mathcal{L}\left(\mathbf{x}, \lambda \right) = f(\mathbf{x}) + \lambda g(\mathbf{x}) $$

$\lambda$ is a penalty for having a positive $g$. The intuitive idea is that there exists a price $\lambda^*$ for violating the constraint such that the optimizer is better off not violating the constraint.

(Side note: it's worth the time to understand KKT sufficient vs. necessary conditions, the primal vs. dual problem, weak duality, strong duality, and the saddle point property of the Lagrangian.)

Simple example:

\begin{equation} \begin{array}{*2{>{\displaystyle}r}} \mbox{minimize (over $x$)} & x^2 \\ \mbox{subject to} & -x \leq 0 \end{array} \end{equation}

Lagrangian is:

$$ \mathcal{L} = x^2 - \lambda x $$ First order condition implies:

$$ 2x = \lambda $$

Solution is $x=0$, $\lambda = 0$. The solution to the unconstrained problem $\min x^2$ is $x = 0$. Basically, the point is on the boundary, but the constraint isn't doing any work.

If the unconstrained solution to the problem would have placed the point $x$ on the boundary of the constraint anyway, the penalty for violating the constraint may be zero. In practice, for many problems, this will be a knife-edge case.

Matthew Gunn
  • 20,541
  • 1
  • 47
  • 85
  • X=0 gives the solution to minimize x^2 and it is in the constraint -x <=0. In such cases, do we need to formulate Lagrangian? I am wondering we need to form the Lagrangian only when the the optimal point for the convex cost function does not satisfy the constraint (which I am struggling to understand now). – mon Jan 12 '17 at 09:23