Why "softmax" is called "softmax"? How is it related to "max"?
I am trying the following code and the results do not look like each other:
a = seq(-1,1,0.05)
b = seq(-1,1,0.05)
softmax <- function(x,y) {
exp(y)/(exp(x)+exp(y))
}
par(mfrow=c(1,2))
c = outer(a,b,pmax)
persp(a,b,c)
d = outer(a,b,softmax)
persp(a,b,d)
The two plots are not similar at all.