This question is not the same as this one I asked previously. In the previous question I asked to prove that the sigmoid and softmax are equivalent. I found a solution here, but I think it's not correct. Here is the exercise that I'm trying to do:
And here is how they prove it:
However it seems incorrect because I don't think we can set $H_t(b)\leftarrow H_t(b)-H_t(b)=0$ and then consequently set $H_t(a)\leftarrow H_t(a)-H_t(b)=H_t(a)$ because it does change the probability. I mean let $H_t(a)=2$ and then left $H_t(b)=1$ and then $H_t(b)\leftarrow 0$. It's not true that (softmax) $P(A_t=a)=\frac{e^2}{e^2+e^1}=\frac{e^2}{e^2+e^0}$
So what is actually the correct way to do it?