The softmax function gives a proper probability for each of the possible classes:
$$
P(y=j|x,\{w_k\}_{k=1...K}) = \frac{e^{x^\top w_j}}{\sum_{k=1}^K e^{x^\top w_k}}
$$
This is nice if you want to interpret your classification problem in a probabilistic setting. Benefits of using the probabilistic formulation include being able to place priors on the parameters and obtaining a posterior distribution over classes.
That said, maybe you can imagine a really good classifier that isn't of this form. Perhaps it is of a form that is generally difficult to express (e.g. SVM -- here for multi-class details). If some such complicated classifier works well for you on a given task, perhaps you don't want to use the [potentially weaker] softmax classifier. In such a setting, there may not be a clear all-way output, so you have to settle for repeated one-vs-others classification schemes.
One more counterpoint...you could also augment the expressive power of the softmax-style approach by changing the input to the exponential. For example, it would be straightforward to replace each linear component $x^\top w_j$ with a quadratic expression $x^\top w_j + x^\top A_j x$. Other such augmentations are conceivable.