5

In the optimization problem in SVM to compute the margin, we use Lagrange multipliers to insert the constraint:
$$L(w,b,\alpha)= \frac{1}{\lambda}|w| - \sum \alpha (y_i(w*x_i+b) -1)$$
Now we want to compute the $\alpha$. It is stated that $\alpha$ for all non-support vectors is 0. How is this statement derived from the above equation? How can that be proved?

UPDATE:
If we solve the dual of an SVM using the KKT conditions we have: $$w_i = \frac{1}{\lambda} \Sigma_{i=1}^{N}\alpha_i y_i x_i$$ One of the main goals of using the dual, is that $w$ can be computed more efficiently because of the above equation and the fact, that most $i\in[N]$, $\alpha_i = 0$, and therefore we just have to just focus on $\{x_i,y_i:\alpha_i \neq 0\}$, which is a small part.

My question is: Why are most of $\alpha_i, i\in [N]$ = 0?

Geometrically it is said, that $\alpha_i\neq 0$ for exactly the data points which lie on the hyperplanes $\{x:w^Tx - b = 1 \}$ or $\{x:w^Tx - b = -1 \}$. I am not sure that this is true. If this is the case, how can that be proven?

MarianD
  • 1,493
  • 2
  • 8
  • 17
Code Pope
  • 781
  • 6
  • 17

3 Answers3

4

A support is actually a vector whose $\alpha$ is non-zero. It is a definition, there is nothing to prove from the equation here.

RUser4512
  • 9,226
  • 5
  • 29
  • 59
  • Is it also a definition that it must be greater than zero? Or is there a reason for it? And why is it just for some vectors non-zero and not for all? – Code Pope Jul 13 '18 at 16:57
3

I have found the answer on my question which can be explained geometrically very well.
We know that the complementary condition of the KKT-conditions says: $$\alpha\geq0, \alpha(y_i(w^Tx_i + b) - 1) = 0$$ Therefore, in a KKT-Point at least one of the following cases happens:

Case 1: $\alpha_i=0$
Case 2: $y_i(w^Tx_i +b) - 1 =0$

Furthermore, we know that the hyperplanes of the margins of the SVM have the following equations:

  1. $H_1 = \{x: w^Tx + b = 1\}$
  2. $H_{-1} = \{x:w^Tx + b = -1\}$

using the margins the following halfspaces are created:

  1. $H_1^+ = \{x: w^Tx + b > 1\}$
  2. $H_{-1}^- = \{x:w^Tx + b < -1\}$

Thus for any $x_i:y_i=1$ and $x_i:y_i=-1$ that is correctly classified and does lie in the inner part of the correct halfspace we have: $$y_i(w^Tx_i +b) -1 > 0$$ Therefore, for these points "Case 2" is violated and therefore "Case 1" i.e., $\alpha_i = 0$ must be true, which means that $\alpha_i = 0$ for all points that are correctly classified and lie in the inner part of their halfspace.
Hence, $\alpha_i$ can only be unequal to $0$, if "Case 2" is true i.e. $y_i(w^Tx_i+b) - 1 = 0$. And this is just true for $x \in H_1$ or $x \in H_{-1}$ which are the points that lie on on the hyperplanes of the margins. And this points are limited.

Therefore, for the most points $\alpha_i = 0$, except the points that lie in the margin which are limited.

Code Pope
  • 781
  • 6
  • 17
-1

Support vectors can be defined as those vectors that lie on the positive or negative hyperplane, i.e. those vectors for which $y_i (w^Tx_i + b) -1=0$. For non-support vectors, $y_i (w^Tx_i + b) -1$ is non zero.

The dual form of the Lagrange multiplier is given by:

$L_p = min_{w,b}max_{\alpha\geq 0} \left(\quad\dfrac{1}{2}||(w)||^2 - \sum_i \alpha_i(y_i (w^Tx_i + b) -1)\right)$

Thus, for non-support vectors, $\alpha_i = 0$, else the inner optimization will "blow up"