2

I was told I had to direct my machine learning questions to this site. So here it goes.

I'm trying to do Multiclass classification with SVM. I have 7 classes. Now I was wondering if the following is possible. I'm thinking of creating 7 SVMs for 1 vs all approach. Am i allowed to create 1 kind of feature vector per class?

So e.g. Class 1 vs rest ==> Use type feature vector 1 (designed for class 1) Class 2 vs rest ==> Use type feature vector 2 (designed for class 2) Class 3 vs rest ==> Use type feature vector 3 (designed for class 3)

And then assign the class-label with the highest confidence (probability), to the datapoint.

Is this cheating ? Or is this allowed ? Is this common practice ?

Olivier_s_j
  • 1,055
  • 2
  • 11
  • 25
  • Where are the features coming from? How are they constructed? – Bitwise Nov 26 '12 at 16:41
  • The features come from text. The featurevectors indicate if a word is present in the text or not. – Olivier_s_j Nov 26 '12 at 17:13
  • Perhaps. If I understand correctly. For example, say in the future you've trained your classifier and you want to assign a label - what then? You don't know the label. So you'd have to try each combination of each feature vector and 1-vs-All classifier. Then you've got now `L * F` classifiers, where `L` is the number of classes and `F` is the number of possible feature vector formulations. Because they are the same in your case, this is `L^2`. Then to finish, you pick the class by doing what? – lollercoaster May 30 '13 at 17:31

2 Answers2

4

Just use all the features in the vector.

Then train your L one-vs-all classifiers, where L is the number of classes. Then upon classifying, choose the class whose classifier returns the highest distance to the hyperplane margin. This is an easy formulation of common practice.

A stronger approach is to use Error Correcting Code Classifiers (ECOC), which is a very robust method for the [3, 7] class range. You'll need a bit more training time and compute resources ((2^(L-1) - 1) classifiers), but it's very powerful. Here's the best paper on the subject:

Solving Multiclass Learning Problems via Error-Correcting Output Codes

lollercoaster
  • 2,016
  • 1
  • 17
  • 15
  • Just a note that Rifkin argues the ECOC approach is not actually "stronger": http://www.jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf – Tommy Mar 19 '15 at 15:54
0

If I understand correctly, from each data instance you'd create L vectors, with i-th vector used in i-vs-rest binary classifier. For testing, you'd do the same for each instance

I think this is legitimate, i.e. no cheating or information leaking that I see.

Having said that, a more general approach would be to create a single representation (at worst by concatenating all L vectors) and use 1-vs-rest. This would allow classifiers to figure out which features are relevant for each class

DAF
  • 167
  • 1
  • 1
  • 10