Why aren't the outputs of the softmax and logistic function the same in this example?

Question

I am trying to understand the equivalence between the logistic function and the softmax function, when k = 2. My understanding is that the following code should output the same values. Where is my mistake?

def sigmoid(x):
    x = np.asarray(x, dtype=np.float)
    x = 1 / (1 + np.exp(-x))
    return x

def softmax(x):
    x = np.asarray(x)
    x = np.exp(x)/np.sum(np.exp(x))
    return x

x = .5
b1 = 1
b2 = -1
sigmoid(x*b1), sigmoid(x*b2), softmax([x*(b1-b2), x*(b2-b2)])
---
Output> 0.6224593312018546, 0.3775406687981454, array([0.73105858, 0.26894142])

Is it because I am assuming that the softmax optimization would find the same betas?

Also: https://stats.stackexchange.com/questions/233658/softmax-vs-sigmoid-function-in-logistic-classifier/254071#254071 — Sycorax, Sep 04 '19 at 12:42
I see no reason why these two functions should return the same values, because they appear to use different denominators. Could you explain why you think they are equivalent? — whuber, Sep 04 '19 at 13:13
@whuber The two links above show why the two are theoretically linked, and the answer below shows what is the relationship between the coefficients that I was missing. — RV1994, Sep 04 '19 at 13:58

score 2 · Accepted Answer · answered Sep 04 '19 at 13:13

The assumption for two class logistic regression and softmax to have the same values is the follows

beta ( logistic regression ) = -(beta1 - beta2)

Following code gives the matching numbers

    def sigmoid1(x):
        x = np.asarray(x, dtype=np.float)
        x = 1 / (1 + np.exp(-x))
        return x

    def sigmoid0(x):
        x = np.asarray(x, dtype=np.float)
        x = np.exp(-x) / (1 + np.exp(-x))
        return x

    def softmax(x):
        x = np.asarray(x)
        x = np.exp(x)/np.sum(np.exp(x))
        return x

    x = .5
    b = -2
    b1 = 1
    b2 = -1
    sigmoid0(x*b), sigmoid1(x*b), softmax([x*b1, x*b2])

(0.7310585786300049, 0.2689414213699951, array([0.73105858, 0.26894142]))

Why aren't the outputs of the softmax and logistic function the same in this example?

1 Answers1