Highest Voted 'softmax' Questions - Statistical Analysis Stack Exchange

109

votes

4 answers

Softmax vs Sigmoid function in Logistic classifier?

What decides the choice of function ( Softmax vs Sigmoid ) in a Logistic classifier ? Suppose there are 4 output classes . Each of the above function gives the probabilities of each class being the correct output . So which one to take for a…

asked Sep 06 '16 at 15:46

mach

1,545
3
10
12

60

votes

5 answers

Backpropagation with Softmax / Cross Entropy

I'm trying to understand how backpropagation works for a softmax/cross-entropy output layer. The cross entropy error function is $$E(t,o)=-\sum_j t_j \log o_j$$ with $t$ and $o$ as the target and output at neuron $j$, respectively. The sum is over…

backpropagation derivative softmax cross-entropy

asked Sep 17 '16 at 23:32

micha

703
1
6
5

56

votes

2 answers

Cross-Entropy or Log Likelihood in Output layer

I read this page: http://neuralnetworksanddeeplearning.com/chap3.html and it said that sigmoid output layer with cross-entropy is quite similiar with softmax output layer with log-likelihood. what happen if I use sigmoid with log-likelihood or…

neural-networks maximum-likelihood softmax

asked Feb 23 '16 at 05:37

malioboro

851
1
11
19

47

votes

6 answers

Why is softmax output not a good uncertainty measure for Deep Learning models?

I've been working with Convolutional Neural Networks (CNNs) for some time now, mostly on image data for semantic segmentation/instance segmentation. I've often visualized the softmax of the network output as a "heat map" to see how high per pixel…

probability deep-learning conv-neural-network uncertainty softmax

asked Oct 24 '17 at 12:58

Honeybear

599
1
6
8

33

votes

2 answers

How to set up neural network to output ordinal data?

I have a neural network set up to predict something where the output variable is ordinal. I will describe below using three possible outputs A < B < C. It is pretty obvious how to use a neural network to output categorical data: the output is…

neural-networks ordinal-data softmax

asked Mar 03 '15 at 01:45

Alex I

913
2
9
18

30

votes

3 answers

Why is softmax function used to calculate probabilities although we can divide each value by the sum of the vector?

Applying the softmax function on a vector will produce "probabilities" and values between $0$ and $1$. But we can also divide each value by the sum of the vector and that will produce probabilities and values between $0$ and $1$. I read the…

machine-learning neural-networks softmax

asked Jul 30 '19 at 01:06

floyd

1,240
13
24

19

votes

2 answers

How deep is the connection between the softmax function in ML and the Boltzmann distribution in thermodynamics?

The softmax function, commonly used in neural networks to convert real numbers into probabilities, is the same function as the Boltzmann distribution, the probability distribution over energies for en ensemble of particles in thermal equilibrium at…

machine-learning neural-networks softmax

asked May 22 '18 at 23:37

bjarkemoensted

452
3
15

15

votes

2 answers

Different definitions of the cross entropy loss function

I started off learning about neural networks with the neuralnetworksanddeeplearning dot com tutorial. In particular in the 3rd chapter there is a section about the cross entropy function, and defines the cross entropy loss as: $C = -\frac{1}{n}…

neural-networks loss-functions softmax cross-entropy

asked Jul 14 '16 at 16:00

Reginald

153
1
6

14

votes

3 answers

Why is hierarchical softmax better for infrequent words, while negative sampling is better for frequent words?

I wonder why hierarchical softmax is better for infrequent words, while negative sampling is better for frequent words, in word2vec's CBOW and skip-gram models. I have read the claim on https://code.google.com/p/word2vec/.

natural-language word2vec word-embeddings softmax

asked Nov 04 '15 at 02:27

Franck Dernoncourt

42,093
30
155
271

14

votes

3 answers

Non-linearity before final Softmax layer in a convolutional neural network

I'm studying and trying to implement convolutional neural networks, but I suppose this question applies to multilayer perceptrons in general. The output neurons in my network represent the activation of each class: the most active neuron corresponds…

neural-networks deep-learning conv-neural-network nonlinear softmax

asked Jul 29 '15 at 08:52

rand

427
1
5
10

13

votes

1 answer

Softmax overflow

Waiting the next course of Andrew Ng on Coursera, I'm trying to program on Python a classifier with the softmax function on the last layer to have the different probabilities. However, when I try to use it on the CIFAR-10 dataset (input : (3072,…

softmax numerics

asked Sep 24 '17 at 20:16

Dlmss

143
1
6

12

votes

1 answer

Log probabilities in reference to softmax classifier

In this https://cs231n.github.io/neural-networks-case-study/ why does it mention "the Softmax classifier interprets every element of ff as holding the (unnormalized) log probabilities of the three classes." I understand why it is unnormalized but…

machine-learning neural-networks information-theory softmax

asked Jul 07 '17 at 16:05

Abhishek Bhatia

461
4
13

12

votes

4 answers

Why is the softmax used to represent a probability distribution?

In the machine learning literature, to represent a probability distribution, the softmax function is often used. Is there a reason for this? Why isn't another function used?

machine-learning distributions softmax

asked Jan 05 '16 at 10:51

SHASHANK GUPTA

1,139
2
10
17

11

votes

4 answers

What is an intuitive interpretation for the softmax transformation?

A recent question on this site asked about the intuition of softmax regression. This has inspired me to ask a corresponding question about the intuitive meaning of the softmax transformation itself. The general scaled form of the softmax function…

intuition softmax

asked Nov 15 '21 at 23:07

Ben

91,027
3
150
376

11

votes

3 answers

Why 'e' in softmax?

I am doing an introduction to ML with tensorflow and I came across softmax activation function. Why is in the softmax formula e? Why not 2? 3? 7? $$ \text{softmax}(x)_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)} $$ $$ \begin{eqnarray} \sum_j a^L_j & = &…

tensorflow softmax

asked Aug 06 '17 at 12:42

Gillian

213
2
6

Questions tagged [softmax]