0

I'm using liblinear to classify documents. The input are arrays of tf-idf scores. Everything is working well, but now I'm trying to determine optimal parameters. I set up a loop of cross validation operations with cost incrementing from loop to loop. However, even over hundreds of iterations I see no effect from changing C. The results are all within 2% of each other, and vary so little that even just running a loop with a single C value can produce the same spread of results. What am I doing wrong or not understanding?

Note: I am using the Ruby binding for liblinear, not the command line tools or direct C code. That said the Ruby binding is a very, very small wrapper around the C library itself and works and acts exactly the same.

Adam Drew
  • 101
  • 1
  • What values of $C$ are you examining? – Sycorax Feb 22 '18 at 16:55
  • Thanks for the response. I tried 1 through 1000. I'm a noob and don't have a good feeling for what range I should be using for C. I tried that range because the default is 1, so I assumed small whole integers made sense. – Adam Drew Feb 22 '18 at 18:08
  • This is a closely related question about choosing $C$ and $\gamma$ for a RBF SVM. https://stats.stackexchange.com/questions/43943/which-search-range-for-determining-svm-optimal-c-and-gamma-parameters The key take-away is (**1**) there's no reason that $C$ must be an integer (but $C$ must be positive) and (**2**) varying $C$ over orders of magnitude is more reasonable. – Sycorax Feb 22 '18 at 18:14
  • Also, your findings could be related to this well-known phenomenon. https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models – Sycorax Feb 22 '18 at 20:07

0 Answers0