10

Can someone explain me how one goes about designing a SVM decision function? Or point me to resource that discusses a concrete example.

EDIT

For the below example, I can see that the equation $X_2 = 1.5$ separates the classes with maximum margin. But how do I adjust the weights and write equations for hyperplanes in the following form.

$$\begin{array}{ll} H_1 : w_0+w_1x_1+w_2x_2 \ge 1 & \text{for}\; Y_i = +1 \\ H_2 : w_0+w_1x_1+w_2x_2 \le -1 & \text{for}\; Y_i = -1.\end{array} $$

enter image description here

I'm trying to get the underlying theory right in 2-D space (as it's easier to visualize) before I think about higher dimensions.

I have worked out solution for this Can someone please confirm if this is correct?

weight vector is (0,-2) and W_0 is 3

$$\begin{array}{ll} H_1 : 3+0x_1-2x_2 \ge 1 & \text{for}\; Y_i = +1 \\ H_2 : 3+0x_1 -2x_2 \le -1 & \text{for}\; Y_i = -1.\end{array} $$

naresh
  • 203
  • 2
  • 6
  • There is an illustration with R [here](http://stats.stackexchange.com/q/5056/930), but I feel your question is more on the algorithmic aspect. In this case, it would help if you could add a little more details about the intended application or available resource. – chl Nov 16 '12 at 12:23
  • @chl I have updated question with details – naresh Nov 17 '12 at 04:03

1 Answers1

12

There are at least two ways to motivate SVMs, but I will take the simpler route here.

Now, forget everything you know about SVM for the moment and just focus on the problem at hand. You are given a set of points $\mathcal{D} = \{(x^i_1, x^i_2, y_i)\}$ along with some labels ($y_i$) which are from $\{1, -1\}$. Now, we are trying to find a line in 2D such that all points with label $1$ fall on one side of the line and all points with label $-1$ fall on the other side.

First of all, realize that $w_0 + w_1x_1 + w_2x_2 = 0$ is a line in 2D and $w_0 + w_1x_1 + w_2x_2 > 0$ represents "one side" of the line and $w_0 + w_1x_1 + w_2x_2 < 0$ represents the "other side" of the line.

From the above we can conclude that we want some vector $[w_0, w_1, w_2]$ such that, $w_0 + w_1x^i_1 + w_2x^i_2 \geq 0$ for all points $x^i$ with $y_i = 1$ and $w_0 + w_1x^i_1 + w_2x^i_2 < 0$ for all points $x^i$ with $y_i = -1$ [1].

Let us assume that such a line actually exists then I can define a classifier in the following way,

$$ \min |w_0| + |w_1| + |w_2| \\ \text{subject to} : w_0 + w_1x^i_1 + w_2x^i_2 \geq 0, \forall x^i\text{ with }y_i = 1 \\ w_0 + w_1x^i_1 + w_2x^i_2 < 0, \forall x^i\text{ with }y_i = -1 \\ $$

I have used an arbitrary objective function above, we don't really care at the moment which objective function is used. We just want a $w$ that satisfies our constraints. Since we have assumed that a line exists such that we can separate the two classes with that line, we will find a solution to the above optimization problem.

The above is not SVM but it will give you a classifier :-). However this classifier may not be very good. But how do you define a good classifier? A good classifier is usually the one which does well on the test set. Ideally, you would go over all possible $w$'s that separate your training data and see which of them does well on the test data. However, there are infinite $w$'s, so this is quite hopeless. Instead, we will consider some heuristics to define a good classifier. One heuristic is that the line that separates the data will be sufficiently far away from all points (i.e there is always gap or margin between the points and the line). The best classifier amongst these is the one with the maximum margin. This is what gets used in SVMs.

Instead of insisting that $w_0 + w_1x^i_1 + w_2x^i_2 \geq 0$ for all points $x^i$ with $y_i = 1$ and $w_0 + w_1x^i_1 + w_2x^i_2 < 0$ for all points $x^i$ with $y_i = -1$, if we insist that $w_0 + w_1x^i_1 + w_2x^i_2 \geq 1$ for all points $x^i$ with $y_i = 1$ and $w_0 + w_1x^i_1 + w_2x^i_2 \leq -1$ for all points $x^i$ with $y_i = -1$, then we are actually insisting that points be far away from the line. The geometric margin corresponding to this requirement comes out to be $\frac{1}{\|w\|_2}$.

So, we get the following optimization problem, $$ \max \frac{1}{\|w\|_2} \\ \text{subject to} : w_0 + w_1x^i_1 + w_2x^i_2 \geq 1, \forall x^i\text{ with }y_i = 1 \\ w_0 + w_1x^i_1 + w_2x^i_2 \leq -1, \forall x^i\text{ with }y_i = -1 \\ $$ A slightly succinct form of writing this is, $$ \min \|w\|_2 \\ \text{subject to} : y_i(w_0 + w_1x^i_1 + w_2x^i_2) \geq 1, \forall i $$ This is basically the basic SVM formulation. I have skipped quite a lot of discussion for brevity. Hopefully, I still got most of the idea through.


CVX script to solve the example problem:

A = [1 2 1; 3 2 1; 2 3 1; 3 3 1; 1 1 1; 2 0 1; 2 1 1; 3 1 1];
b = ones(8, 1);
y = [-1; -1; -1; -1; 1; 1; 1; 1];
Y = repmat(y, 1, 3);
cvx_begin
variable w(3)
minimize norm(w)
subject to
(Y.*A)*w >= b
cvx_end

Addendum - Geometric Margin

Above we have already requested that we look for $w$ such that $y_i(w_0 + w_1x_1 + w_2x_2) \geq 1$ or generally $y_i(w_0 + w^Tx) \geq 1$. The LHS here that you see is called the functional margin, so what we have requested here is that the functional margin be $\geq 1$. Now, we will try to compute the geometric margin given this functional margin requirement.

What is geometric margin? Geometric margin is the shortest distance between points in the positive examples and points in the negative examples. Now, the points that have the shortest distance as required above can have functional margin greater than equal to 1. However, let us consider the extreme case, when they are closest to the hyperplane that is, functional margin for the shortest points are exactly equal to 1. Let $x_+$ be the point on the positive example be a point such that $w^Tx_+ + w_0 = 1$ and $x_-$ be the point on the negative example be a point such that $w^Tx_- + w_0 = -1$. Now, the distance between $x_+$ and $x_-$ will be the shortest when $x_+ - x_-$ is perpendicular to the hyperplane.

Now, with all the above information we will try to find $\|x_+ - x_-\|_2$ which is the geometric margin. $$w^Tx_+ + w_0 = 1$$ $$w^Tx_- + w_0 = -1$$ $$w^T(x_+ - x_-) = 2$$ $$|w^T(x_+ - x_-)| = 2$$ $$\|w\|_2\|x_+ - x_-\|_2 = 2$$ $$\|x_+ - x_-\|_2 = \frac{2}{\|w\|_2}$$


[1] It doesn't actually matter which side you choose for $1$ and $-1$. You just have to stay consistent with whatever you choose.

TenaliRaman
  • 3,542
  • 25
  • 26
  • Thanks for the explanation. Can you also kindly show how to apply all this to the example i have mentioned in the question? – naresh Nov 17 '12 at 06:08
  • @naresh This might be a good exercise. You should try writing out optimization problem for the example you gave. – TenaliRaman Nov 17 '12 at 08:43
  • updated with solution.. Can you please see if i got it right? – naresh Nov 29 '12 at 07:58
  • 1
    @naresh Yeap, solving this is in cvx gave me the exact same solution that you have $w = [0, -2, 3]$. – TenaliRaman Nov 29 '12 at 09:55
  • @naresh I have added my cvx script also to the answer above. As one can see from the script, going from here to a general SVM is not very hard. In fact, writing code for SVM is very very easy and given that one has access to some nifty libraries like MOSEK, CVX etc. only makes things simpler (making it faster on the other hand is a harder problem). The only one thing I haven't discussed in my answer above is consideration of slack variables. But it is a very common thing to all optimization problems. Understanding those shouldn't be that difficult. – TenaliRaman Nov 29 '12 at 12:30
  • isn't there a typo in that $w_0 + w_1x^i_1 + w_2x^i_2 < 1$ should be $w_0 + w_1x^i_1 + w_2x^i_2 \leq -1$? – entropy Jan 02 '13 at 23:07
  • Can you show that geometric margin corresponding to this requirement comes out to be $\frac{1}{||w||}$, please? – entropy Jan 02 '13 at 23:09
  • 1
    @entropy thanks I have fixed the typo. I will add the geometric margin explanation. – TenaliRaman Jan 02 '13 at 23:47
  • 1
    @entropy I have updated the answer with the geometric margin explanation. – TenaliRaman Jan 03 '13 at 00:15
  • can someone please explain to me why we are including a bias term b or $w_0$. Why not just exclude it? – entropy Apr 03 '13 at 09:08
  • 1
    @entropy $w^{T}x$ is a hyperplane passing through origin. To cover the space of all linear equations you need the bias term. Think of points residing in 2D and let us say that you are trying to find a line that separates these points. However these points all lie in the first quadrant. Now one can arrange these points such that they are separable but not by any line that passes through the origin. However, a line with a proper bias can do it. – TenaliRaman Apr 03 '13 at 10:18
  • 1
    @entropy Having said the above, you might have realized by now that if you properly rotate and shift the points, even a line passing through the origin should be able to separate the classes. However, usually finding this right rotation and shift is not easy, compared to just learning the bias term. – TenaliRaman Apr 03 '13 at 10:19
  • It seems like SVMs become harder to understand every time I try to figure them out. – entropy Apr 03 '13 at 10:34
  • @entropy : You are asking the right questions, don't worry you will reach enlightenment :-) – TenaliRaman Apr 04 '13 at 16:14
  • The linear program you posit to "define a classifier" has a huge problem. The minimum might never be attained: if there exists any solution with $w_0+w_1+w_2\lt 0$, then all positive multiples of that solution work (as well as many more) and the minimum approaches $-\infty$, which is useless; otherwise, the minimum is attained as $w_0+w_1+w_2$ approaches $0$ from above, which is also useless. Your second program has equally fatal flaws. – whuber Apr 16 '14 at 16:11
  • @whuber I agree with the first having a problem, I have modified it slightly now to escape that issue. However, the second is the hard margin SVM formulation. I would like to understand what exactly is your issue with that? – TenaliRaman Apr 16 '14 at 17:04
  • The second looks ok: I hadn't noticed the $0$s of the first program had been replaced by $1$s. Thanks for making the change. Now that it makes sense, I have scanned down to your addendum. I don't see why the functional margin of shortest-distance points must be $1$. A counterexample is $\{(-9,0,-1),(9,0,-1),(0,-3,-1),(4,-1,-1);(-9,3,1),(9,3,1),(4,1,1),(0,\epsilon \gt 0,1)\}$. The shortest-distance points are $(4,-1,1),(4,1,1)$ whose distance is $2$ but their FMs will be $2(1-\epsilon)/\epsilon$ and $2(1+\epsilon)/\epsilon,$ both of which are huge when $\epsilon$ is small. – whuber Apr 16 '14 at 17:42
  • @whuber Ah sorry, I see what you are saying. What is the weight vector that you are using to compute the FMs? – TenaliRaman Apr 16 '14 at 17:54
  • I used $(w_0,w_1,w_2)=(-\epsilon/2,0,2/\epsilon)$. – whuber Apr 16 '14 at 18:05
  • Wait, why are we doing this again? It is our desire that FMs be greater than equal to 1. That is when we solve our optimization, all points must have satisfied FMs >= 1. Maybe you are having problems because I have used FMs = 1 everywhere in the derivation? Would you be comfortable if I replaced them with FMs >= 1 ? We will simply derive that GMs >= 2 / |w|_2, so essentially, it will be like maximizing the lower bound. – TenaliRaman Apr 16 '14 at 18:33
  • It is hard for me--and likely for many others--to understand what you write when it includes statements that are clearly not true. It makes one wonder what you really mean, whether you have typos, whether you are mistaken, or whether we, the readers, actually understand. It causes readers to respond with statements like "It seems like SVMs become harder to understand every time I try to figure them out." As a moderator I am trying to nudge you to clean up your answer so that it more clearly expresses the ideas you want to get across and causes less confusion for its readers. – whuber Apr 16 '14 at 19:00
  • @whuber Absolutely! I know what you are trying to do :-) I am trying to understand whether you got the point I was trying to make. In case, my earlier comment made sense, I will modify the addendum appropriately. – TenaliRaman Apr 16 '14 at 19:40
  • @whuber Please check my edit to the addendum and see if that is more reasonable now. – TenaliRaman Apr 17 '14 at 03:10