I saw the following explanation for Entropy in probability:
(Entropy). The surprise of learning that an event with probability $p$ happened is defined as $\log_2(1/p)$, measured in a unit called bits. Low-probability events have high surprise, while an event with probability $1$ has zero surprise. The $\log$ is there so that if we observe two independent events $A$ and $B$, the total surprise is the same as the surprise from observing $A \cap B$. The $\log$ is base $2$ so that if we learn that an event with probability $1/2$ happened, the surprise is $1$, which corresponds to having received $1$ bit of information.
I then read this answer by user "mitchus".
Given these two descriptions, I am still unable to come to dispel an aspect of my confusion, and the more I think about it, the more confused I become. If entropy is the "surprise" of learning that an event with probability $p$ happened, then wouldn't the distribution with the highest entropy be the one with the most possible outcomes spread over the largest distance, so that there are many outcomes, each of which have a very low probability of occurring? Or does this actually describe a uniform distribution? Thank you.