I am currently studying the Sutton and Barto Intro To RL Book, and I'm trying to do exercise 2.9 (at the bottom of the following picture):
So the exercise wants me to show that the softmax is equivalent to the sigmoid and logistic function in the case when we have 2 actions.
I have seen this answer. I am going to try to replicate what he does:
Showing that $\text{softmax}(x) \Leftrightarrow \sigma(x)$
Let $\mathbf{x}= \begin{pmatrix} H_t(a) \\ H_t(b) \end{pmatrix}$. Then we can represent the softmax function as $$P(A_t=a)=\frac{e^{\beta_a H_t(a)}}{e^{\beta_a H_t(a)}+e^{\beta_b H_t(b)}}$$
The sigmoid we can represent the following way:
$$\sigma(x) = \frac{1}{1+e^{-\beta \mathbf{x}}}$$ and we make $\beta = \begin{pmatrix}\beta_a \\ -\beta_b\end{pmatrix}$ so that $$\sigma(x) = \frac{1}{1+e^{-\beta_aH_t(a)+\beta_b H_t(b)}}=\frac{e^{\beta_aH_t(a)}}{e^{\beta_aH_t(a)}+e^{\beta_bH_t(b)}}$$
But then how do I get rid of the $\beta$? It doesn't seem like I proved that they are equivalent. Any help?