Multiclass classification with SVM a question about the feature vectors

Question

I was told I had to direct my machine learning questions to this site. So here it goes.

I'm trying to do Multiclass classification with SVM. I have 7 classes. Now I was wondering if the following is possible. I'm thinking of creating 7 SVMs for 1 vs all approach. Am i allowed to create 1 kind of feature vector per class?

So e.g. Class 1 vs rest ==> Use type feature vector 1 (designed for class 1) Class 2 vs rest ==> Use type feature vector 2 (designed for class 2) Class 3 vs rest ==> Use type feature vector 3 (designed for class 3)

And then assign the class-label with the highest confidence (probability), to the datapoint.

Is this cheating ? Or is this allowed ? Is this common practice ?

Where are the features coming from? How are they constructed? — Bitwise, Nov 26 '12 at 16:41
The features come from text. The featurevectors indicate if a word is present in the text or not. — Olivier_s_j, Nov 26 '12 at 17:13
Perhaps. If I understand correctly. For example, say in the future you've trained your classifier and you want to assign a label - what then? You don't know the label. So you'd have to try each combination of each feature vector and 1-vs-All classifier. Then you've got now `L * F` classifiers, where `L` is the number of classes and `F` is the number of possible feature vector formulations. Because they are the same in your case, this is `L^2`. Then to finish, you pick the class by doing what? — lollercoaster, May 30 '13 at 17:31

score 4 · Accepted Answer · answered May 30 '13 at 17:37

Just use all the features in the vector.

Then train your L one-vs-all classifiers, where L is the number of classes. Then upon classifying, choose the class whose classifier returns the highest distance to the hyperplane margin. This is an easy formulation of common practice.

A stronger approach is to use Error Correcting Code Classifiers (ECOC), which is a very robust method for the [3, 7] class range. You'll need a bit more training time and compute resources ((2^(L-1) - 1) classifiers), but it's very powerful. Here's the best paper on the subject:

Solving Multiclass Learning Problems via Error-Correcting Output Codes

Just a note that Rifkin argues the ECOC approach is not actually "stronger": http://www.jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf — Tommy, Mar 19 '15 at 15:54

score 0 · Answer 2 · answered Nov 15 '17 at 20:10

If I understand correctly, from each data instance you'd create L vectors, with i-th vector used in i-vs-rest binary classifier. For testing, you'd do the same for each instance

I think this is legitimate, i.e. no cheating or information leaking that I see.

Having said that, a more general approach would be to create a single representation (at worst by concatenating all L vectors) and use 1-vs-rest. This would allow classifiers to figure out which features are relevant for each class

Multiclass classification with SVM a question about the feature vectors

2 Answers2