15

What factors should be considered when determining whether to use multiple binary classifiers or a single multiclass classifier?

For example, I'm building a model that does hand gesture classification. A simple case has 4 outputs: [None, thumbs_up, clenched_fist, all_fingers_extended]. I see two ways to approach this:

Option 1 - Multiple binary classifiers

  1. [None, thumbs_up]
  2. [None, clenched_fist]
  3. [None, all_fingers_extended]

Option 2 - Single multiclass classifier

  1. [None, thumbs_up, clenched_first, all_fingers_extended]

Which approach tends to be better and in what conditions?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
megashigger
  • 295
  • 1
  • 2
  • 7

1 Answers1

13

Your Option 1 may not be the best way to go; if you want to have multiple binary classifiers try a strategy called One-vs-All.

In One-vs-All you essentially have an expert binary classifier that is really good at recognizing one pattern from all the others, and the implementation strategy is typically cascaded. For example:

  if classifierNone says is None: you are done
  else:
    if classifierThumbsUp says is ThumbsIp: you are done
    else:
      if classifierClenchedFist says is ClenchedFist: you are done
      else:
        it must be AllFingersExtended and thus you are done

Here is a graphical explanation of One-vs-all from Andrew Ng's course: Example


Multi-class classifiers pros and cons:

Pros:

  • Easy to use out of the box
  • Great when you have really many classes

Cons:

  • Usually slower than binary classifiers during training
  • For high-dimensional problems they could really take a while to converge

Popular methods:

  • Neural networks
  • Tree-based algorithms

One-vs-All classifiers pros and cons:

Pros:

  • Since they use binary classifiers, they are usually faster to converge
  • Great when you have a handful of classes

Cons:

  • It is really annoying to deal with when you have too many classes
  • You really need to be careful when training to avoid class imbalances that introduce bias, e.g., if you have 1000 samples of none and 3000 samples of the thumbs_up class.

Popular methods:

  • SVMs
  • Most ensemble methods
  • Tree-based algorithms
Pablo Rivas
  • 461
  • 4
  • 8
  • It would be better to clarify that the $h_{\theta}^i$ functions are output probabilities, and the final label is determined by $\arg\max_i h_{\theta}^i$ – Lii Dec 24 '17 at 16:08
  • That's a very good answer regarding One-vs-All & One-vs-One and it would be good to post your answer also here because the following post is more popular (but it is more or less on the same topic): https://stats.stackexchange.com/questions/91091/one-vs-all-and-one-vs-one-in-svm. – Outcast Nov 08 '18 at 23:16