18

I know that the SVM is a binary classifier. I would like to extend it to multi-class SVM. Which is the best, and maybe the easiest, way to perform it?

code: in MATLAB

   u=unique(TrainLabel); 
    N=length(u); 
    if(N>2)    
        itr=1;    
        classes=0;   
        while((classes~=1)&&(itr<=length(u)))   
            c1=(TrainLabel==u(itr));    
            newClass=double(c1); 
            tst = double((TestLabel == itr));
            model = svmtrain(newClass, TrainVec, '-c 1 -g 0.00154');  
            [predict_label, accuracy, dec_values] = svmpredict(tst, TestVec, model);    
            itr=itr+1;   
        end
        itr=itr-1;
    end

How can this be improved?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
lakshmen
  • 1,045
  • 4
  • 12
  • 26
  • What does the variable `classes` do in the code? It seems to be useless. –  Sep 03 '13 at 06:42
  • did you get to any conclusion? i have this problem with my work.if you have reached to a suitable rusult please share your multiclassification code here.thanks. – me.rasouli Sep 06 '15 at 05:55

2 Answers2

16

There are a lot of methods for multi-class classification. Two classic options, which are not SVM-specific are:

  1. One-vs-all (OVA) classification:
    Suppose you have classes A, B, C, and D. Instead of doing a four way classification, train up four binary classifiers: A vs. not-A, B vs. not-B, C vs. not-C, and D vs. not-D. Then, pick either the positive class that's "best" (e.g., furtherest from the margin across all four runs). If none of the classifications are positive (i.e., they're all not-X), pick the "opposite" of class that's worst (e.g., closest to the margin).

  2. All-vs-All:
    Train all possible pairs of classifications. Rank the classes by some factor (e.g., # of times selected), and pick the best.

Which works best has been contentious: Duan and Keerthi have an empirical study that suggests a specific all-vs-all method, while Rifkin and Klautau argue for a one-vs-all scheme. There are even schemes where one learns error-correcting codes describing the class labels, instead of the labels themselves.

Good luck!

Edit: What you really want, particularly for OVA, is the posterior probability of each class. For some methods, like Naive Bayes, that's trivial to get out. SVMs typically don't give you probabilities, but there are ways to compute them. See John Platt's 1999 paper "Probabilistic Outputs for Support Vector Machines..."

Nickopops
  • 3
  • 2
Matt Krause
  • 19,089
  • 3
  • 60
  • 101
  • 2
    For OVA - can you pick the class that has the largest probability (induced by Platt scaling)? – B_Miner Jan 22 '12 at 01:21
  • 1
    Yeah, that's basically the upshot of the Duan and Keerthi paper. They combine the Platt probabilties with Hastie's pairwise coupling trick and get good results. I should probably edit the text to include that. Good catch B_Miner! – Matt Krause Jan 22 '12 at 02:00
  • in SVM, do u need to do voting or sum-pooling? – lakshmen Jan 22 '12 at 02:46
  • @lakesh, One-vs-all or All-vs-all are like voting schemes. If you're using a set of binary classifiers, you've got to do *something* to turn them into a multi-class classifier. Alternately, you can use the modified SVM described by carlosdc below... – Matt Krause Jan 28 '12 at 17:28
  • what is that something? – lakshmen Jan 28 '12 at 17:49
  • @lakesh, I meant to say that you've got a lot of choices and, without trying them, it's not clear which will be best for your application. I'd try starting with the one-vs-all, as it's "classic" and not quite as computationally demanding. The Rifkin and Klautau paper has the specifics of their approach, which looks reasonably straightforward. – Matt Krause Jan 28 '12 at 20:42
  • my point is the code above does the classification into binary classifiers but the missing part is the voting. I just need to include the voting which i don't know how to do... – lakshmen Jan 28 '12 at 21:11
  • I took the liberty of formatting your post using some of CV's markdown options to make it a little more readable, @MattKrause. If you don't like it, roll them back w/ my apologies. – gung - Reinstate Monica Aug 03 '13 at 02:41
  • @iakesh Voting gets done via distance to margin. I struggled with that too. – Tommy Mar 19 '15 at 16:08
6

Let me add that there is work on extending SVMs to multiple classes (as opposed to the methods Matt Krause describes that are decomposition into several binary classification tasks). One important work is: On the Algorithmic Implementation of Multiclass Kernel-based Vector Machine

carlosdc
  • 3,145
  • 20
  • 25