There is a difference between probabilities and log probabilities. If the probability of an event is 0.36787944117, which happens to be $1/e$, then the log probability is -1.
Therefore, if you are given a bunch of unnormalized log probabilities, and you want to recover the original probabilities, first you take the exponent of all your numbers, which gives you unnormalized probabilities. Next, you normalize them like usual. Mathematically, this is
$$p_j = \frac{e^{z_j}}{\sum_i e^{z_i}}$$
where $p_j$ is the probability of the $j$th class and $z_i$ is the inputs to the softmax classifier.
The obvious question is why bother performing doing exponents. Why not use
$$p_j = \frac{z_j}{\sum_i z_i}$$
instead?
One reason for this is because the softmax plays nicely with cross-entropy loss, which is $-E_q[\log p]$, where $q$ is the true distribution (the labels). Intuitively, the log cancels out with the exponent, which is a very helpful for us.
It turns out that if you take the gradient of the cross-entropy loss with respect to the inputs to the classifier $\vec z$, you get
$$\vec p - 1_j$$
when the ground truth label is in class $j$ and $1_j$ is the corresponding one-hot vector. This is a very nice expression and leads to easy interpretation and optimization.
On the other hand, if you try to use unnormalized probabilities instead of unnormalized log probabilities, you end up with the gradient being
$$\frac{1}{\sum_i z_i} - \vec 1_j^T\frac{1}{z}$$
This expression is much less nice in terms of interpretability and you can also see potential numerical problems when $z$ is close to 0.
Another reason to use log probabilities can be seen from logistic regression, which is simply a special case of softmax classification. The shape of the sigmoid function works well because, intuitively, as you move across the feature space, the probability of classes does not vary linearly with the inputs. The sharp bend in the sigmoid function, which emphasizes the sharp boundary between two classes, is really a result of the exponential term we are applying to the inputs of softmax.