Do information entropy probabilities have to sum to one?

Question

My understanding of information entropy is that it requires the input probabilities to sum to 1.

So, for a sequence a,a,b,b you then have $$- \left(\frac12 \log_2 \frac12 + \frac12 \log_2 \frac12\right) = 1$$

Are there versions of information entropy that don't require probabilities to sum to 1? Or, is there a way to measure entropy that is also sensitive to the quantity of items, not only their probability? Or, is there an accepted way to derive a form of 'non-normalised' information entropy that somehow takes into account that the longer the information stream is, the more likely you'll come across various arrangements of information?

E.g., not that this is accurate, but to convey the question: Let's say you can compute a non-normalised entropy for the same sequence a,a,b,b as such:

$$-\left(\frac12 \log_2 \frac12+\frac12 \log_2 \frac12+\frac12 \log_2 \frac12+\frac12 \log_2 \frac12\right) = 2$$

Alternately, can you sum the information content over a string of information?

For a,a,b,b you have four items at 1 bit of surprise each, therefore 4 total bits.
For a,a,a,a,a,a,a,a,a,b you have 10 items at 0.469 bits average surprise, therefore 4.69 total bits?

You can invent any formula you like. Whether it means anything or is useful for any purpose is a different matter. What is this "non-normalized information entropy" intended to reveal about the data? — whuber, Jul 25 '14 at 13:25
I'm trying to somehow quantify (preferably with some or another accepted method) that more information is contained in larger amounts of data, even if the probability of that information being (lets say) 1 or 0 doesn't really change with the size. So if you have 10 coin tosses, then you have 10 instances of 1 bit of information, whereas 20 coin tosses is 20 instances of 1 bit. Is it perhaps so simple as just saying 10 coin tosses = 10 bits and 20 coin tosses = 20 bits? — songololo, Jul 25 '14 at 13:53
In your question, then, you need to provide a quantitative, operational definition of your meaning of "information," because evidently it differs from the standard meaning as captured by the usual entropy calculation. — whuber, Jul 25 '14 at 13:58

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

It looks like you are generating a probability distribution given a string, then getting the entropy of that probability distribution. Strictly speaking there is no such thing as the entropy of a string (e.g. "aabb"). Therefore the way in which you estimate the probability distribution given the string will play a big part. You may wish to read this regarding how to generate probability distributions: https://stats.stackexchange.com/a/108990/36915.

So to answer your question, you could use the proposed form of probability estimation above which uses Minimum Cross Entropy Updating and Expected Absolute Deviation. This will indeed mean the more data you have (the longer the string) the more 'certain' the probabilities will become, and therefore "aaaa" will give you a different entropy from "aaaaaaaa".

Another more general, but unfortunately computationally impossible, approach is to use Kolmogorov complexity - this measure of information is defined directly on strings.

Do information entropy probabilities have to sum to one?

1 Answers1