0

I am currently studying the Sutton and Barto Intro To RL Book, and I'm trying to do exercise 2.9 (at the bottom of the following picture):

enter image description here

So the exercise wants me to show that the softmax is equivalent to the sigmoid and logistic function in the case when we have 2 actions.

I have seen this answer. I am going to try to replicate what he does:

Showing that $\text{softmax}(x) \Leftrightarrow \sigma(x)$

Let $\mathbf{x}= \begin{pmatrix} H_t(a) \\ H_t(b) \end{pmatrix}$. Then we can represent the softmax function as $$P(A_t=a)=\frac{e^{\beta_a H_t(a)}}{e^{\beta_a H_t(a)}+e^{\beta_b H_t(b)}}$$

The sigmoid we can represent the following way:

$$\sigma(x) = \frac{1}{1+e^{-\beta \mathbf{x}}}$$ and we make $\beta = \begin{pmatrix}\beta_a \\ -\beta_b\end{pmatrix}$ so that $$\sigma(x) = \frac{1}{1+e^{-\beta_aH_t(a)+\beta_b H_t(b)}}=\frac{e^{\beta_aH_t(a)}}{e^{\beta_aH_t(a)}+e^{\beta_bH_t(b)}}$$

But then how do I get rid of the $\beta$? It doesn't seem like I proved that they are equivalent. Any help?

Slim Shady
  • 203
  • 9

1 Answers1

0

The standard logistic function is just

$$ \sigma(x) = \frac{1}{1 + e^{-x}} = \frac{e^x}{e^x + 1} $$

so you don't need any $\beta$'s. The $\beta$'s appear in logistic regression. See also the Softmax vs Sigmoid function in Logistic classifier? thread. So you've proven what you wanted to show.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Why don't I need $\beta$'s? I don't see how I can prove it without including them. I mean if $x=\begin{pmatrix} H_t(a) \\ H_t(b) \end{pmatrix}$ then how is $p(A_t=a)=\text{softmax}(x)=\frac{e^{H_t(a)}}{e^{H_t(a)}+e^{(H_t(a)}}=\sigma(x) = \frac{e^{\begin{pmatrix} H_t(a) \\ H_t(b) \end{pmatrix}}}{e^{\begin{pmatrix} H_t(a) \\ H_t(b) \end{pmatrix}} + 1}$? I actually referenced the question you reference in your answer! – Slim Shady Jan 20 '22 at 09:39
  • https://github.com/iamhectorotero/rlai-exercises/blob/master/Chapter%202/Exercise%202.7.md – Slim Shady Jan 20 '22 at 10:07
  • Actually that's not the correct answer I think. They can't make $H_t(a) \leftarrow H_t(a)$ and $H_t(b)\leftarrow 0$ – Slim Shady Jan 20 '22 at 10:23
  • @SlimShady it is correct, see the answer to your second question. You found an external resource confirming my answer, misunderstood it and based on this claimed that the answer is incorrect and downvoted it (?). This kind of attitude may make people be hesitant to answer your questions in the future. – Tim Jan 20 '22 at 11:29