3

I'm using libSVM for binary classification and my training data is very unbalanced (-1:90%, +1:10%). According to libSVM's documentation, it's better to set different penalties for positive and negative classes. For example, the SVM problem is:

$\min\limits_{w,b,\xi} \frac{1}{2}{\bf w^Tw} + C^+\sum\limits_{y_i=1} \xi_i + C^-\sum\limits_{y_i=-1} \xi_i$

My question is which penalty should be larger and why. Thanks

user11869
  • 289
  • 2
  • 9
  • Check out [this paper](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CCIQFjAA&url=http%3A%2F%2Fwww.researchgate.net%2Fpublication%2F221112311_Applying_Support_Vector_Machines_to_Imbalanced_Datasets%2Ffile%2F5046351b0229e207bd.pdf&ei=47SPU-HnG5aWqAanhYCADQ&usg=AFQjCNHyZ23NvEq87WMeizGN0I88mmVeZw&sig2=XisZd_z8rURfzJi01tCrcg&bvm=bv.68235269,d.b2k). It shines some light on ideas on how to deal with unbalanced data. –  Jun 05 '14 at 00:09
  • Welcome to the site, @nickb. Would you mind adding a brief summary of the information in that paper in case the link goes dead, &/or so readers can know if they want to pursue it further? – gung - Reinstate Monica Jun 05 '14 at 00:30

1 Answers1

7

The larger the penalty, the more an error on the training set (which is what is measured by $\xi_i$) for a pattern of that class influences the model. So if you have more negative patterns than positive patterns then you probably want to make $C^+$ larger than $C^-$. Personally if there is a class imbalance problem then it usually means that the costs of false-positive and false-negative errors are not the same, and the relative costs of the errors is an important criterion for adjusting the penalties. I would suggest using cross-validation to estimate the expected loss and choose the penalties to minimise that.

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • A small addition: In practice, I have found that often in unbalanced problems, different penalties yield the same CV performance. So, OP, keep in mind that changing penalties will not necessarily improve your results in CV. – Bitwise Oct 25 '12 at 19:27
  • @Bitwise, what would you do in this case then? – user11869 Oct 25 '12 at 19:39
  • @user11689 there is not much to do, this is just another parameter to play with to try and improve your results (with proper CV, of course). – Bitwise Oct 25 '12 at 19:52