14

I am currently using Scikit learn with the following code:

clf = svm.SVC(C=1.0, tol=1e-10, cache_size=600, kernel='rbf', gamma=0.0, 
              class_weight='auto')

and then do fit and predict for a set of data with 7 different labels. I got a weird output. No matter which cross validation technique I use the predicted label on the validation set is always going to be label 7.

I try some other parameters, including the full default one (svm.SVC()) but as long as the kernel method I use is rbf instead of poly or linear it just would not work, while it work really fine for poly and linear.

Besides I have already try prediction on train data instead of validation data and it perfectly fit.

Does anyone see this kind of problem before and know what is going on here?

I never look at my class distribution in detail but I know it should be around 30% of them are 7, 14% are 4.

I even try a manual 1-vs-rest implementation and it is still not helpful.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Tamaki Sakura
  • 435
  • 1
  • 3
  • 11

2 Answers2

13

A likely cause is the fact you are not tuning your model. You need to find good values for $C$ and $\gamma$. In your case, the defaults turn out to be bad, which leads to trivial models that always yield a certain class. This is particularly common if one class has much more instances than the others. What is your class distribution?

scikit-learn has limited hyperparameter search facilities, but you can use it together with a tuning library like Optunity. An example about tuning scikit-learn SVC with Optunity is available here.

Disclaimer: I am the lead developer of Optunity.

Marc Claesen
  • 17,399
  • 1
  • 49
  • 70
  • I have actually mannuelly try every combination of C and gamma that is a power of 10 from 0 to 4 but all of them give me full 7. I've even start to doubt if I compile the scikit learn in the correct way. – Tamaki Sakura Nov 25 '14 at 15:36
11

The problem does turn out to be parameter testing. I did not try when gamma is between 0.0 (which is 1/n_feature) and 1. On my data gamma should be turn to something around 1e-8

Tamaki Sakura
  • 435
  • 1
  • 3
  • 11
  • 4
    This makes perfect sense. Too large values of $\gamma$ lead to a kernel matrix that is close to the unit matrix. Every prediction will end up being the bias term (since all kernel evaluations are very close to zero), which happen to lead to class 7 in your case. – Marc Claesen Nov 25 '14 at 21:47