0

Let $\mathcal{X}$ be a training set which will feed a binary SVM with RBF kernel. $\mathcal{X}$ consists of $10$ positive examples and $100$ negative examples. I am interested in optimizing the parameters of the above SVM, i.e. the well-known parameters $C$, $\gamma$.

What I am doing now, is to partition the above set, $\mathcal{X}$, into a $70\%$ training subset, and a $30\%$ testing subset, and carry out a grid-search ($3$-fold cross-validation) in order to obtain the best pair $(C_{opt},\gamma_{opt})$.

That is, $\mathcal{X}$ is partitioned such that the following three subsets are created $$ \mathcal{X}_{1},\:\mathcal{X}_{2},\:\mathcal{X}_{3}, $$ and hold, respectively, $4$, $4$, and $3$ positive samples (randomly chosen). Moreover, each subset also consists of a number of negative samples ($34$, $33$, and $33$, respectively), randomly chosen, as well.

The cross-validation procedure, though, does not seem to obtain the optimal parameters.

What would you suggest me to do? Thank you very much in advance!

nullgeppetto
  • 273
  • 1
  • 2
  • 10

1 Answers1

1

One issue you are likely having is with your unbalanced dataset, only 10% of your examples are positive. You could address this issue through resampling or class weighting your examples. Some of the methods mentioned in these links may help:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.9248&rep=rep1&type=pdf

http://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html

SVM for unbalanced data

https://stackoverflow.com/questions/11736125/how-do-you-handle-data-imbalance-in-svm

bill_e
  • 2,681
  • 1
  • 19
  • 33
  • Thanks for your answer! Besides the information I could get from your links, do you think that I could just use fewer negative samples (e.g. $70$ instead of $100$)? – nullgeppetto Feb 16 '15 at 20:32
  • 1
    If you do that then you'll lose the information those 30 samples could have provided your classifier! What if they are the important 30, and the other 70 are random noise? – bill_e Feb 16 '15 at 20:36
  • Yes, I see... I think that weighted-class SVM could do the job. But, concerning the cross-validation described above, what do you think? Is there any mistake? – nullgeppetto Feb 16 '15 at 20:39
  • 1
    No, your crossvalidation procedure looks fine to me. I think your main challenge is your small and unbalanced data set. – bill_e Feb 16 '15 at 20:40
  • Thanks a lot! Your suggestions seem helpful, I'll wait though for other users too. – nullgeppetto Feb 16 '15 at 20:45
  • 1
    Also, check the other link I added for a more thorough review of this situation – bill_e Feb 16 '15 at 20:47