1

I'm trying to implement the SVM+, an SVM variant that uses privileged information $x^*$ at training time but predicts based only on the regular training data $x$:

Here $K(x_i, x_j)$ and $K^*(x_i^*, x_j^*)$ are kernels in $X$ and $X^*$ spaces that define inner products in $Z$ and $Z^*$ spaces and $\alpha, \beta$ are the solution of the following optimization problem: maximize the functional \begin{multline} R(\alpha, \beta) = \sum_{i=1}^\ell \alpha_i - \frac12 \sum_{i,j=1}^\ell \alpha_i \alpha_j y_i y_j K(x_i, x_j) \\ - \frac{1}{2 \gamma} \sum_{i,j=1}^\ell (\alpha_i + \beta_i - C)(\alpha_j + \beta_j - C) K^*(x_i^*, x_j^*) \end{multline} subject to three types of constraints \begin{gather} \sum_{i=1}^\ell (\alpha_i + \beta_i - C) = 0 \\ \sum_{i=1}^\ell y_i \alpha_i = 0 \\ \alpha_i \ge 0, \quad \beta_i \ge 0 \end{gather}

I've read this tech report which manages to turn the above problem into a problem that a QP solver can understand:

Celik, Izmailov and McDaniel. Proof and Implementation of Algorithmic Realization of Learning Using Privileged Information (LUPI) Paradigm: SVM+. Institute of Networking and Security Research (INSR) Technical Report, 20 December 2015, NAS-TR-0187-2015.

I was hoping somebody could give a more intuitive explanation and help me understand how to turn $R$ into the form $\frac{1}{2}x^TPx-qx$ as I'm really struggling to follow it.

combo
  • 1,177
  • 7
  • 19
Oliver
  • 53
  • 4
  • Please type your question as text, do not just post a photograph (see [here](http://meta.stats.stackexchange.com/a/3176/)). – gung - Reinstate Monica Jan 24 '17 at 17:30
  • @Oliver I transcribed the image and added a few more details; please verify that I did it correctly. Additionally, I'm not sure exactly what you meant by "1/2(xPx)-qx" and so didn't notate it; if you can either convert it to $\LaTeX$ notation or provide a reference to an equation number or something in the tech report, that would be helpful. – Danica Jan 24 '17 at 19:51
  • 1
    I would recommend focusing on section III in the paper - write out the definitions of each variable (in their notation, $x$, $H$ and $f$) and then convince yourself that the two formulations are equivalent. The discussion about the Lagrangian is used for the decision rule and can be skipped. Their formulation is really just a way of rearranging the variables from the formulation (which they call (2)) to the standard QP equations. – combo Jan 24 '17 at 21:51
  • @Dougal Thank you - sorry for the bad form in my post. I understand that $\frac{1}{2}x^TPx-q^Tx$ is the form that CVXOPT and other QP solvers like the problem to be defined as - according to [this tutorial](https://courses.csail.mit.edu/6.867/wiki/images/a/a7/Qp-cvxopt.pdf) – Oliver Jan 25 '17 at 10:50
  • @StefanJorgensen Thanks - guess I'm going to whip the pen and paper out. I think the bit that's a little leap for me is the matrix $H$. Am I right in thinking that (where i and j are 2) $H =\begin{pmatrix} \begin{pmatrix} H_{1,1}^{11} & H_{1,1}^{12} \\ H_{1,1}^{12} & H_{1,1}^{22} \end{pmatrix} \begin{pmatrix} H_{1,2}^{11} & H_{1,2}^{12} \\ H_{1,2}^{12} & H_{1,2}^{22} \end{pmatrix} \\ \begin{pmatrix} H_{2,1}^{11} & H_{2,1}^{12} \\ H_{2,1}^{12} & H_{2,1}^{22} \end{pmatrix} \begin{pmatrix} H_{2,2}^{11} & H_{2,2}^{12} \\ H_{2,2}^{12} & H_{2,2}^{22} \end{pmatrix} \end{pmatrix} $ ? – Oliver Jan 25 '17 at 11:04
  • From the top of page 5, $i,j = 1,\dots, L$, meaning that the $H^{11}$, $H^{12}$ and $H^{22}$ blocks are all $L\times L$ matrices (meaning $H$ is $2L\times 2L$). Remember that $x$ is also a block vector, with $x = [\vec{\alpha}, \vec{\beta}]$ so you'll end up with $x^THx = \vec{\alpha}^TH^{11}\vec{\alpha} + 2\vec{\alpha}^TH^{12}\vec{\beta} + \vec{\beta}^TH^{22}\vec{\beta}$ – combo Jan 25 '17 at 16:19
  • 1
    Ohhhhh! That makes a lot of sense. Thank you so much @StefanJorgensen ! – Oliver Jan 25 '17 at 18:01

0 Answers0