Recently I asked a question about GAN,What is the intuition behind the expected value in orginal GAN papers objective function? , In there I came to know that the discriminator output is viewed as a probability distribution. Although I have an amateurish knowledge in maths, I have created many vanilla GAN models. In those models, discriminator neural network has one output node which yields a scaler value between 0 and 1 for sigmoid activation function.
Now lets consider this pseudo example, lets say I have a batch of three real images $x = \left \{ x_1,x_2,x_3\right \}$ and we are passing this batch to our discriminator $D(x)$, lets assume these are its corresponding outputs $y = \left \{ 0.1,0.8,0.5\right \}$ given the last layer activation is $sigmoid$.
One of the rule of probability distribution is sum of all the probabilities or area under the curve should be equal to one but if we take sum of $0.1+0.8+0.5= 1.4$ we get a sum more than one.
Here we are presuming that the output activation is $sigmoid$ but I have also seen and implemented many vanilla GAN models where the last activation function is $tanh$ ,that is, $D(x)= (-1,1)$.
Given this, how discriminator output in GAN is considered as a probability distribution?