Softmax regression or $K$ binary logistic regression

Question

For a multi-class classification problem, we can use $K$ binary logistic classifiers, or one softmax regression classifier, so how to make the choice between the two?

IMHO, the $K$ binary logistic classifiers is just the 1-vs-all scheme for multi-class, but softmax classifier inherently handles multi-class problem. Why should I prefer one over the other?

one thing to consider is the data: are labels mutually exclusive (softmax would probably fit better) or not (e.g., {animal, dog, cat}, here you might want to assign a single example to multiple labels) — Alex Kreimer, Nov 11 '19 at 15:00

Josh · Answer 1 · 2015-04-22T16:49:49.253

The softmax function gives a proper probability for each of the possible classes:
$$ P(y=j|x,\{w_k\}_{k=1...K}) = \frac{e^{x^\top w_j}}{\sum_{k=1}^K e^{x^\top w_k}} $$

This is nice if you want to interpret your classification problem in a probabilistic setting. Benefits of using the probabilistic formulation include being able to place priors on the parameters and obtaining a posterior distribution over classes.

That said, maybe you can imagine a really good classifier that isn't of this form. Perhaps it is of a form that is generally difficult to express (e.g. SVM -- here for multi-class details). If some such complicated classifier works well for you on a given task, perhaps you don't want to use the [potentially weaker] softmax classifier. In such a setting, there may not be a clear all-way output, so you have to settle for repeated one-vs-others classification schemes.

One more counterpoint...you could also augment the expressive power of the softmax-style approach by changing the input to the exponential. For example, it would be straightforward to replace each linear component $x^\top w_j$ with a quadratic expression $x^\top w_j + x^\top A_j x$. Other such augmentations are conceivable.

It is important to call things by their proper names. This is just the multinomial (polytomous) logistic model. Joint optimization justing maximum likelihood will give the same parameter estimates as running separate binary logistic models, though perhaps not the same standard errors. — Frank Harrell, Nov 14 '15 at 15:08

Softmax regression or $K$ binary logistic regression

1 Answers1

Linked