SVM: Why alpha for non support vector is zero and why most vectors have zero alpha?

Question

In the optimization problem in SVM to compute the margin, we use Lagrange multipliers to insert the constraint:
$$L(w,b,\alpha)= \frac{1}{\lambda}|w| - \sum \alpha (y_i(w*x_i+b) -1)$$
Now we want to compute the $\alpha$. It is stated that $\alpha$ for all non-support vectors is 0. How is this statement derived from the above equation? How can that be proved?

UPDATE:
If we solve the dual of an SVM using the KKT conditions we have: $$w_i = \frac{1}{\lambda} \Sigma_{i=1}^{N}\alpha_i y_i x_i$$ One of the main goals of using the dual, is that $w$ can be computed more efficiently because of the above equation and the fact, that most $i\in[N]$, $\alpha_i = 0$, and therefore we just have to just focus on $\{x_i,y_i:\alpha_i \neq 0\}$, which is a small part.

My question is: Why are most of $\alpha_i, i\in [N]$ = 0?

Geometrically it is said, that $\alpha_i\neq 0$ for exactly the data points which lie on the hyperplanes $\{x:w^Tx - b = 1 \}$ or $\{x:w^Tx - b = -1 \}$. I am not sure that this is true. If this is the case, how can that be proven?

@Sycorax Why is it zero for so many data points and just non zero for the support vectors? — Code Pope, Jul 13 '18 at 16:59

score 4 · Answer 1 · answered Jul 12 '18 at 11:41

4

A support is actually a vector whose $\alpha$ is non-zero. It is a definition, there is nothing to prove from the equation here.

answered Jul 12 '18 at 11:41

RUser4512

9,226
5
29
59

Is it also a definition that it must be greater than zero? Or is there a reason for it? And why is it just for some vectors non-zero and not for all? – Code Pope Jul 13 '18 at 16:57

score 3 · Accepted Answer · answered Sep 23 '20 at 15:42

I have found the answer on my question which can be explained geometrically very well.
We know that the complementary condition of the KKT-conditions says: $$\alpha\geq0, \alpha(y_i(w^Tx_i + b) - 1) = 0$$ Therefore, in a KKT-Point at least one of the following cases happens:

Case 1: $\alpha_i=0$
Case 2: $y_i(w^Tx_i +b) - 1 =0$

Furthermore, we know that the hyperplanes of the margins of the SVM have the following equations:

$H_1 = \{x: w^Tx + b = 1\}$
$H_{-1} = \{x:w^Tx + b = -1\}$

using the margins the following halfspaces are created:

$H_1^+ = \{x: w^Tx + b > 1\}$
$H_{-1}^- = \{x:w^Tx + b < -1\}$

Thus for any $x_i:y_i=1$ and $x_i:y_i=-1$ that is correctly classified and does lie in the inner part of the correct halfspace we have: $$y_i(w^Tx_i +b) -1 > 0$$ Therefore, for these points "Case 2" is violated and therefore "Case 1" i.e., $\alpha_i = 0$ must be true, which means that $\alpha_i = 0$ for all points that are correctly classified and lie in the inner part of their halfspace.
Hence, $\alpha_i$ can only be unequal to $0$, if "Case 2" is true i.e. $y_i(w^Tx_i+b) - 1 = 0$. And this is just true for $x \in H_1$ or $x \in H_{-1}$ which are the points that lie on on the hyperplanes of the margins. And this points are limited.

Therefore, for the most points $\alpha_i = 0$, except the points that lie in the margin which are limited.

score -1 · Answer 3 · answered Oct 25 '18 at 21:53

Support vectors can be defined as those vectors that lie on the positive or negative hyperplane, i.e. those vectors for which $y_i (w^Tx_i + b) -1=0$. For non-support vectors, $y_i (w^Tx_i + b) -1$ is non zero.

The dual form of the Lagrange multiplier is given by:

$L_p = min_{w,b}max_{\alpha\geq 0} \left(\quad\dfrac{1}{2}||(w)||^2 - \sum_i \alpha_i(y_i (w^Tx_i + b) -1)\right)$

Thus, for non-support vectors, $\alpha_i = 0$, else the inner optimization will "blow up"

SVM: Why alpha for non support vector is zero and why most vectors have zero alpha?

3 Answers3