My understanding of information entropy is that it requires the input probabilities to sum to 1.
So, for a sequence a,a,b,b you then have $$- \left(\frac12 \log_2 \frac12 + \frac12 \log_2 \frac12\right) = 1$$
Are there versions of information entropy that don't require probabilities to sum to 1? Or, is there a way to measure entropy that is also sensitive to the quantity of items, not only their probability? Or, is there an accepted way to derive a form of 'non-normalised' information entropy that somehow takes into account that the longer the information stream is, the more likely you'll come across various arrangements of information?
E.g., not that this is accurate, but to convey the question: Let's say you can compute a non-normalised entropy for the same sequence a,a,b,b as such:
$$-\left(\frac12 \log_2 \frac12+\frac12 \log_2 \frac12+\frac12 \log_2 \frac12+\frac12 \log_2 \frac12\right) = 2$$
Alternately, can you sum the information content over a string of information?
- For a,a,b,b you have four items at 1 bit of surprise each, therefore 4 total bits.
- For a,a,a,a,a,a,a,a,a,b you have 10 items at 0.469 bits average surprise, therefore 4.69 total bits?