6

I am learning how to use libsvm through sklearn.svm in python. I read here about what happens and why when you change the C value as part of your model. My intuition from what I've learned would be that lower C values would use less support vectors to make a more general classification, while higher C values would use more support vectors to attempt to 'overfit' and account for all outliers.

That is not the case. For an example where I looped through a set of C values like so:

print(c)
model = svm.SVC(kernel='linear', C=c)
model.fit(Xtrain, ytrain)
print("support vectors:", len(model.support_))

I got results:

1.0
support vectors: 1810
10.0
support vectors: 1750
100.0
support vectors: 1626
1000.0
support vectors: 1558

As you can see, as C goes up, the number of support vectors used in the model goes down. Why does this happen?

kingledion
  • 741
  • 7
  • 20

1 Answers1

6

In short, C is the penalty on the slack variables, which measure the degree to which the margin constraints are violated. A training pattern violates the margin constraint if the kernel expansion (i.e. the output of the SVM) has a value between -1 and +1, and all patterns violating this constraint will be support vectors. If you increase C, a greater penalty is put on violation of the constraint, the solution will change to reduce the size of the violations (and hence the number of violations) so the margin is made narrower, and less patterns will fall inside it, so there are fewer support vectors.

The expected error rate depends on both the margin and also on the sum of the slack variables, so in practice generalisation is maximised by a compromise between the two qualities, giving an intermediate value of C.

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • 1
    Could you clarify what do you mean by "_kernel expansion_" please @Dikran Marsupial? – Daneel Olivaw Mar 28 '17 at 12:59
  • With respect to [this](http://stats.stackexchange.com/questions/31066/what-is-the-influence-of-c-in-svms-with-linear-kernel/31069#31069) answer of yours, you say there that increasing C increases the complexity of the hypothesis class. Can you explain in your answer how the hypothesis class can get more complex if the number of support vectors is reduced? I do not understand that. – kingledion Mar 28 '17 at 13:24
  • 1
    @DaneelOlivaw I just mean the output of the support vector machine (before thresholding). – Dikran Marsupial Mar 28 '17 at 13:42
  • 2
    @kingledion the number of support vectors is not a particularly good measure of complexity as the magnitude of the weights is also relevant. Vapnik defines the concept of an essential support vector (one that the decision boundary cannot be defined without), but can often find a good approximation with fewer SVs than the algorithm gives. Note also that increasing the complexity of a hypothesis class does not mean increasing the complexity of a particular hypothesis from that class. – Dikran Marsupial Mar 28 '17 at 13:45