Apropos of a follow-up question to this post, I tried to prove to myself that I understood the notation in the equation for the bias (page 5) in support vector machine SVM (classification, linear kernel), which is
$$b=\frac{1}{N_s}\sum_{s\in S}\left( y_s - \sum_{m\in S} \alpha_m\;y_m\;\mathbf x_m \cdot \mathbf x_s \right)$$
corresponding (I believe) to the average across $N_s$ support vectors of the dot product of these vectors, i.e. $\mathbf x_m \cdot \mathbf x_s,$ scaled by the coefficients, $\alpha_m ,$ and classification values ($y_m=1$ or $y_m = - 1).$
As a toy example and reference point, I am using the example in this post, summarized as
x1s <- c(.5,1,1,2,3,3.5,1,3.5,4,5,5.5,6)
x2s <- c(3.5,1,2.5,2,1,1.2,5.8,3,4,5,4,1)
ys <- c(rep(+1,6), rep(-1,6))
my.data <- data.frame(x1=x1s, x2=x2s, type=ys)
library(e1071)
svm.model <- svm(type ~ ., data=my.data, type='C-classification', kernel='linear',scale=FALSE)
# get parameters of hiperplane
w <- t(svm.model$coefs) %*% svm.model$SV
(b <- -svm.model$rho)
# [1] 5.365853
And we can prove that svm.model$rho
is indeed the negative bias $b:$
Gathering together the support vectors with their labels and coefficients:
(sv = as.matrix(sapply(cbind(my.data[rownames(svm.model$SV),], coef = svm.model$coefs),as.numeric)))
# x1 x2 type coef
# [1,] 3.5 1.2 1 1.0000000
# [2,] 3.5 3.0 -1 -0.6487805
# [3,] 6.0 1.0 -1 -0.3512195
and remembering that the support vectors fulfill the equality
$$y_s\left(\mathbf w^\top \mathbf x_s + b\right)=1$$
as one of the constraints.
The bias, $\mathbf b,$ can calculated in the above example simply as:
-((sv[,"type"] * (svm.model$SV %*% t(w))) - matrix(rep(1,nrow(svm.model$SV)),,1))
[,1]
6 5.390244
8 -5.365854
12 -5.365854
which is in fact equal to rho The negative intercept
as in the svm documentation.
In trying to reproduce rho
(or $b$) with the initial formula this is what I have tried:
ind = numeric(3)
for (i in 1:3){
ind[i] = sv[i,"type"] - sv[,"type"] %*% (sv[,"coef"] * (sv[,1:2] %*% sv[i,1:2]))
}
mean(ind)
# [1] -40.53398
which yields a result different from rho
above (i.e. svm.model$rho [1] -5.365853
.
What am I doing wrong? Am I messing up the linear algebra, or misunderstanding the equation?