I got the model for the logistic regression for multiclass which is given by
$$ P(Y=j|X^{(i)}) = \frac{\exp(\theta_j^TX^{(i)})}{1+ \sum_{m=1}^{k}\exp(\theta_m^T X^{(i)})} $$
where k is the number of classes theta is the parameter to be estimated j is the jth class Xi is the training data
Well one thing I didn't get is how come the denominator part $$ 1+ \sum_{m=1}^{k}\exp(\theta_m^T X^{(i)}) $$ normalized the model. I mean it makes the probability stay between 0 and 1.
I mean I am used to logistic regression being
$$ P(Y=1|X^{(i)}) = 1/ (1 + \exp(-\theta^T X^{(i)})) $$
Actually, I am confused with the nomalization thing. In this case since it is a sigmoid function it never lets the value be less than 0 or greater than 1. But I am confused in the multi class case. Why is it so?
This is my reference https://list.scms.waikato.ac.nz/pipermail/wekalist/2005-February/029738.html. I think it should have been to be normalizing $$ P(Y=j|X^{(i)}) = \frac{\exp(\theta_j^T X^{(i)})}{\sum_{m=1}^{k} \exp(\theta_m^T X^{(i)})} $$