Manually finding the bias term in classification SVM in R

Question

Apropos of a follow-up question to this post, I tried to prove to myself that I understood the notation in the equation for the bias (page 5) in support vector machine SVM (classification, linear kernel), which is

$$b=\frac{1}{N_s}\sum_{s\in S}\left( y_s - \sum_{m\in S} \alpha_m\;y_m\;\mathbf x_m \cdot \mathbf x_s \right)$$

corresponding (I believe) to the average across $N_s$ support vectors of the dot product of these vectors, i.e. $\mathbf x_m \cdot \mathbf x_s,$ scaled by the coefficients, $\alpha_m ,$ and classification values ($y_m=1$ or $y_m = - 1).$

As a toy example and reference point, I am using the example in this post, summarized as

x1s <- c(.5,1,1,2,3,3.5,1,3.5,4,5,5.5,6)
x2s <- c(3.5,1,2.5,2,1,1.2,5.8,3,4,5,4,1)
ys <- c(rep(+1,6), rep(-1,6))
my.data <- data.frame(x1=x1s, x2=x2s, type=ys)

library(e1071)
svm.model <- svm(type ~ ., data=my.data, type='C-classification', kernel='linear',scale=FALSE)

# get parameters of hiperplane
w <- t(svm.model$coefs) %*% svm.model$SV
(b <- -svm.model$rho)
# [1] 5.365853

And we can prove that svm.model$rho is indeed the negative bias $b:$

Gathering together the support vectors with their labels and coefficients:

(sv = as.matrix(sapply(cbind(my.data[rownames(svm.model$SV),], coef = svm.model$coefs),as.numeric)))
#       x1  x2   type    coef
# [1,] 3.5 1.2    1    1.0000000
# [2,] 3.5 3.0   -1   -0.6487805
# [3,] 6.0 1.0   -1   -0.3512195

and remembering that the support vectors fulfill the equality

$$y_s\left(\mathbf w^\top \mathbf x_s + b\right)=1$$

as one of the constraints.

The bias, $\mathbf b,$ can calculated in the above example simply as:

-((sv[,"type"] * (svm.model$SV %*% t(w))) - matrix(rep(1,nrow(svm.model$SV)),,1))
        [,1]
6   5.390244
8  -5.365854
12 -5.365854

which is in fact equal to rho The negative intercept as in the svm documentation.

In trying to reproduce rho (or $b$) with the initial formula this is what I have tried:

ind = numeric(3)
for (i in 1:3){
     ind[i] = sv[i,"type"] - sv[,"type"] %*% (sv[,"coef"] * (sv[,1:2] %*% sv[i,1:2]))
}
mean(ind)
# [1] -40.53398

which yields a result different from rho above (i.e. svm.model$rho [1] -5.365853.

What am I doing wrong? Am I messing up the linear algebra, or misunderstanding the equation?

vhcandido · Answer 1 · 2018-01-12T01:43:13.643

2

There's a tiny mistake in your loop.

From e1071 documentation:

coefs The corresponding coefficients times the training labels.

It means coefs = $\alpha_S \odot y_S$ or, for a specific index, coefs[m] = $\alpha_m y_m$

When you compute this line

ind[i] = sv[i,"type"] - sv[,"type"] %*% (sv[,"coef"] * (sv[,1:2] %*% sv[i,1:2]))

it's the same as

ind[i] = sv[i,'type'] - sum(sv[,'coef'] * sv[,'type'] * sv[,1:2] %*% sv[i,1:2])

and, assuming $X_S = \{\mathbf{x}_i | i \in S\}$ is composed of column vectors, it corresponds to the following formulas (notice the extra $y_m$ inside the innermost sum) $$ y_s - \mathbf{y}_S \cdot \left (\mathbf{\alpha}_S \odot \mathbf{y}_S \odot X_S^T\mathbf{x}_s\right ) =\\ y_s - \sum_{m \in S} \alpha_m y_m y_m \langle \mathbf{x}_m,\mathbf{x}_s \rangle \\ $$

So to reproduce this formula

$$ y_s - \sum_{m \in S} \alpha_m y_m \langle \mathbf{x}_m,\mathbf{x}_s \rangle $$

use the line below inside the loop

ind[i] = sv[i,'type'] - sum(sv[,'coef'] * sv[,1:2] %*% sv[i,1:2])

Wrapping up, the correct loop is

ind = numeric(3)
for(i in 1:3) {
    ind[i] = sv[i,'type'] - sum(sv[,'coef'] * sv[,1:2] %*% sv[i,1:2])
}

And the result it yields is the same you obtained when computed the biases for the constraints $y_s(\mathbf{w}^T\mathbf{x}_s +b) = 1$.

> ind
# [1] 5.390244 5.365854 5.365854

> mean(ind)
# [1] 5.373984

Just a note here that in your example you should divide by $y_s$ again, to obtain $$b = \frac{y_s \mathbf{w}^T\mathbf{x}_s - 1}{-y_s}$$

> bias = -((sv[,"type"] * (svm.model$SV %*% t(w))) -1)/sv[,'type']
#        [,1]
# 6  5.390244
# 8  5.365854
# 12 5.365854

> bias
# [1] 5.390244 5.365854 5.365854

> mean(bias)
# [1] 5.373984

TL;DR: remove the sv[,'type']

edited Jan 12 '18 at 01:43

answered Jan 11 '18 at 18:23

vhcandido

121
4

Just a quick and mindless check with the replacement you indicate yields 5.373984 as opposed to 5.390244. Would you mind including the entire code chunk in your response? – Antoni Parellada Jan 11 '18 at 19:50
Sorry, I should've added it. Yes, the answer you should get with this line is the same as when you computed the bias for the equality constraints. – vhcandido Jan 12 '18 at 01:43
I see you have the answer, but I still don't get why we get that 5.390244, or the mean 5.373984, when `(b – Antoni Parellada Jan 12 '18 at 03:44
I thought it could be numerical error at first or that the constraint related to the first support vector had some slack value there yet, because in the original formulation the constraints are inequalities, however as it's a margin maximization problem it wouldn't make sense to have different distances between the support vectors and the separating hyperplane. Anyway I should check the formulation and get back to you. – vhcandido Jan 13 '18 at 05:19
1

@AntoniParellada I took a look at the dual formulation and as cost=1 is the default C value in e1071's svm, the constraints $0 \leq \alpha_i \leq C, \forall i$ and $\sum_i \alpha_i y_i = 0$ get a little too restrictive and one of the alphas (associated to the only support vector of the positive class) can't grow larger than 1. Executing with a larger cost you allow it to eventually get to ~1.01 and you'll see that $\langle \mathbf{w},\mathbf{x} \rangle +b $ will yield $-1$ or $+1$ for every support vector (instead of ~.98 for one of them, in this case). – vhcandido Jan 14 '18 at 02:47
1

If you want I could add it to the original answer and plot both margins (with different C values) so you could see that one of them moves a little with this change. – vhcandido Jan 14 '18 at 02:49

Manually finding the bias term in classification SVM in R

1 Answers1