5

I was reading the documentation of sklearn SVM and saw these two statements

  • Still effective in cases where number of dimensions is greater than the number of samples
  • If the number of features is much greater than the number of samples, the method is likely to give poor performances.

Now my understanding regarding SVM is fairly limited (the reason why I am reading the article) but the above 2 statements seem contradictory. But I believe that I am missing something. What is missing in my understanding due to which the above 2 are not contradictory?

Aseem Bansal
  • 325
  • 4
  • 11

2 Answers2

7

SVMs, like many other linear models, are based on empirical risk minimization, which leads us to an optimization of this sort:

$$\min_w\sum\ell(x_i, y_i, w)+\lambda\cdot r(w)$$

Where $\ell$ is a loss function (the Hinge-loss in SVMs) and $r$ is a regularization function.

The SVM is a squared $\ell_2$-regularized linear model, i.e. $r(w) = \|w\|^2_2$. This guards against huge coefficients, as one would say in regression terms, as the coefficient magnitudes are themselves penalized in the optimization.

Besides that, the regularization allows for a unique solution in the $p> n$ situation, so the 1st statement is true.

The problem when $p \gg n$ is that the bias introduced by regularization can be so large towards the training data the model heavily underperforms. It doesn't mean SVMs can't be used in that scenario (they are usually employed in gene expression data, for example, where $p$ can be thousands times larger than $n$).

So I see no contradiction in statements. The 2nd statement is more likely a warning against overfitting.

Firebug
  • 15,262
  • 5
  • 60
  • 127
  • 2
    I guess it depends on the interpretation "much greater" vs "greater" :) definitely, the more samples the better in any case. – Franck Dernoncourt Jan 04 '17 at 18:25
  • In your answer p refers to number of features and n refers to number of samples. Correct? Would you consider understanding linear models necessary to understand SVM? This question may seem off-topic but is not. Asking this because I am not able to understand what is written and am deciding whether I should proceed in sequence here http://scikit-learn.org/stable/user_guide.html before coming back here to understand the answer. – Aseem Bansal Jan 04 '17 at 18:37
  • 1
    Yes, $p$ is the number of features and $n$ the number of samples. I think a good understanding of linear models is interesting, but in no way necessary to *employ* SVMs. If you want to understand what the SVM do, then yes, basic concepts of linear models, kernels and optimization are necessary. – Firebug Jan 04 '17 at 18:43
  • 1
    @AseemBansal FYI I tried to write some easy introduction to SVM here: [How does a Support Vector Machine (SVM) work?](http://stats.stackexchange.com/a/254658/12359). As Firebug points out, it makes use of basic concepts of linear models, kernels and optimization. – Franck Dernoncourt Jan 05 '17 at 06:36
  • 1
    @Firebug Thanks for the explanation. I want to apply them but don't want to be completely unaware when tuning parameters about what changing them actually means. My assumption is that if I understand the various algorithms better then I can make better choices when tuning parameters and choosing between algorithms. – Aseem Bansal Jan 05 '17 at 15:59
  • @AseemBansal And, from experience, you are completely right about that. – Firebug Jan 05 '17 at 18:27
4
  • Still effective in cases where number of dimensions is greater than the number of samples
  • If the number of features is much greater than the number of samples, the method is likely to give poor performances.

These two points are pretty much contradictory, as one feature is typically one-dimensional (some features may be multidimensional but it is less common).

The first point is the correct one, see SVM, Overfitting, curse of dimensionality and Number of features vs. number of observations for more details.

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271