1

The question might be quite straightforward but I can't seem to be find any relevant resources from Google. All the sources I found are focused on explaining difference between softmax and sigmoid functions and when and how to use each of them. Could anyone enlighten me on this?

Also, is there a statistical way to choose when, say, 2 classes are assigned the same probabilities by the softmax classifier? Thank you!

  • 1
    Of possible interest: https://stats.stackexchange.com/a/469059/247274 One aspect of this that I like is that, since you have the probabilities, you can use them to make more decisions than there are classes! – Dave Oct 08 '20 at 03:11
  • 1
    If your question is if it is possible mathematically, then the answer is "yes, of course". If your question is if it is likely that a (properly trained) softmax classifier will ever output the exact same probabilities for all classes, then the answer is "no" (unless you handcraft the input example to force this situation). – learner Dec 31 '20 at 11:51

2 Answers2

1

Softmax yields probabilities for all classes. If your classes are dog, cat, horse, and alligator, then you get the probability of each. What you do with those probability values is up to you, and you might even elect to go with the least probable class!

If you wind up in the situation where your probabilities are, respectively, $0.4$, $0.4$, $0.1$, and $0.1$, then you’re right that there is no obvious winner: both dog and cat are equally likely. However, a probability model like this does not make classifications for you. It is up to you to decide what to do with the probability values. For instance, it might be so terrible to classify an alligator as anything other than an alligator (e.g., “Awww, look at the cute puppy/kitty/pony…ow!!!”) that you’re willing to classify as an alligator, despite $P(Alligator)=0.1$ (so at least one of the other animals is more likely).

If all wrong classifications are equal in terms of how much they cost you, then it doesn’t matter if the $0.4/0.4/0.1/0.1$ probability vector is classified as a dog or a cat. Flip a coin to decide; always pick dog; always pick cat; alternate between the two. The tie does not matter.

As another answer mentions, however, you are unlikely to find yourself in a situation where multiple classes are tied, even if it is technically possible (fairly straightforward to simulate).

EDIT

Here's a simulation where every predicted probability is equal (and should be equal).

library(nnet)
set.seed(2022)
y <- rep(c("dog", "cat", "horse", "alligator"), 2)
model <- nnet::multinom(y ~ 1)
predict(model, type = 'probs')

    alligator   cat     dog     horse
1   0.25        0.25    0.25    0.25
2   0.25        0.25    0.25    0.25
3   0.25        0.25    0.25    0.25
4   0.25        0.25    0.25    0.25
5   0.25        0.25    0.25    0.25
6   0.25        0.25    0.25    0.25
7   0.25        0.25    0.25    0.25
8   0.25        0.25    0.25    0.25
Dave
  • 28,473
  • 4
  • 52
  • 104
0

I feel that it is highly unlikely that you have a test point exactly on the decision boundary especially if you have many parameters in the model. We seldom run into such problem for sigmoid function/logistic regression (2 class). Why would softmax (>2 classes) be any different?

hehe
  • 347
  • 2
  • 9