Many binary classifiers vs. single multiclass classifier

Question

What factors should be considered when determining whether to use multiple binary classifiers or a single multiclass classifier?

For example, I'm building a model that does hand gesture classification. A simple case has 4 outputs: [None, thumbs_up, clenched_fist, all_fingers_extended]. I see two ways to approach this:

Option 1 - Multiple binary classifiers

[None, thumbs_up]
[None, clenched_fist]
[None, all_fingers_extended]

Option 2 - Single multiclass classifier

[None, thumbs_up, clenched_first, all_fingers_extended]

Which approach tends to be better and in what conditions?

Pablo Rivas · Accepted Answer · 2017-12-24T01:45:27.263

Your Option 1 may not be the best way to go; if you want to have multiple binary classifiers try a strategy called One-vs-All.

In One-vs-All you essentially have an expert binary classifier that is really good at recognizing one pattern from all the others, and the implementation strategy is typically cascaded. For example:

  if classifierNone says is None: you are done
  else:
    if classifierThumbsUp says is ThumbsIp: you are done
    else:
      if classifierClenchedFist says is ClenchedFist: you are done
      else:
        it must be AllFingersExtended and thus you are done

Here is a graphical explanation of One-vs-all from Andrew Ng's course:

Multi-class classifiers pros and cons:

Pros:

Easy to use out of the box
Great when you have really many classes

Cons:

Usually slower than binary classifiers during training
For high-dimensional problems they could really take a while to converge

Popular methods:

Neural networks
Tree-based algorithms

One-vs-All classifiers pros and cons:

Pros:

Since they use binary classifiers, they are usually faster to converge
Great when you have a handful of classes

Cons:

It is really annoying to deal with when you have too many classes
You really need to be careful when training to avoid class imbalances that introduce bias, e.g., if you have 1000 samples of none and 3000 samples of the thumbs_up class.

Popular methods:

SVMs
Most ensemble methods
Tree-based algorithms

It would be better to clarify that the $h_{\theta}^i$ functions are output probabilities, and the final label is determined by $\arg\max_i h_{\theta}^i$ — Lii, Dec 24 '17 at 16:08
That's a very good answer regarding One-vs-All & One-vs-One and it would be good to post your answer also here because the following post is more popular (but it is more or less on the same topic): https://stats.stackexchange.com/questions/91091/one-vs-all-and-one-vs-one-in-svm. — Outcast, Nov 08 '18 at 23:16

Many binary classifiers vs. single multiclass classifier

1 Answers1