Good introduction into different kinds of entropy

Question

I'm looking for a book or online resource that explains different kinds of entropy such as Sample Entropy and Shannon Entropy and their advantages and disadvantages. Can someone point me in the right direction?

score 10 · Accepted Answer · edited Sep 02 '10 at 09:52

10

Cover and Thomas's book Elements of Information Theory is a good source on entropy and its applications, although I don't know that it addresses exactly the issues you have in mind.

edited Sep 02 '10 at 09:52

csgillespie

11,849
9
56
85

answered Jul 20 '10 at 17:22

Mark Meckes

2,916
3
19
18

4

Also the paper "Information Theoretic Inequalities" by Dembo Cover and Thomas reveals a lot of deep aspects – robin girard Aug 09 '10 at 13:36
1

Still, none of those books claim that there is more than one entropy. – Sep 02 '10 at 10:25

score 6 · Answer 2 · answered Jul 27 '10 at 00:21

6

These lecture notes on information theory by O. Johnson contain a good introduction to different kinds of entropy.

answered Jul 27 '10 at 00:21

Alekk

539
3
5

robin girard · Answer 3 · 2010-09-02T09:21:53.573

5

If your interested in the mathematical statistic around entropy, you may consult this book

http://www.renyi.hu/~csiszar/Publications/Information_Theory_and_Statistics:_A_Tutorial.pdf

it is freely available !

edited Sep 02 '10 at 09:21

answered Jul 26 '10 at 21:34

robin girard

6,335
6
46
60

score 3 · Answer 4 · answered Sep 05 '10 at 23:52

3

Grünwald and Dawid's paper Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory discuss generalisations of the traditional notion of entropy. Given a loss, its associated entropy function is the mapping from a distribution to the minimal achievable expected loss for that distribution. The usual entropy function is the generalised entropy associated with the log loss. Other choices of losses yield different entropy such as the Rényi entropy.

answered Sep 05 '10 at 23:52

Mark Reid

131
3

1

So then, sigma is the entropy of N(0,sigma) corresponding to squared error, and min(p,1-p) is the entropy of Bernoulli(p) corresponding to 0,1 prediction loss? Seems like quite a generalization! – Yaroslav Bulatov Sep 06 '10 at 00:03
Yes. The entropy for square loss is constant and the entropy for 0-1 loss is min(p,1-p). What's also interesting is that these have strong correspondences to divergences too. The square loss to the Hellinger divergence and 0-1 loss to variational divergence. Since entropies defined like this they are necessarily concave functions and it turns out the f-divergence built using f(p) = -entropy(p). Bob Williamson and I have explored some of this in our paper: http://arxiv.org/abs/0901.0356 . It's fun stuff. – Mark Reid Sep 12 '10 at 05:39
1

Here's something interesting I found about divergences recently -- each step of Belief Propagation can be viewed as a Bregman Projection http://www.ece.drexel.edu/walsh/Walsh_TIT_10.pdf – Yaroslav Bulatov Oct 02 '10 at 17:05

score 2 · Answer 5 · answered Sep 05 '10 at 06:49

2

Jaynes shows how to derive Shannon's entropy from basic principles in his book.

One idea is that if you approximate $n!$ by $n^n$, entropy is the rewriting of the following quantity $$\frac{1}{n}\log \frac{n!}{(n p_1)!\cdots (n p_d)!}$$

The quantity inside the log is the number of different length n observation sequences over $d$ outcomes that are matched by distribution $p$, so it's a kind of a measure of explanatory power of the distribution.

answered Sep 05 '10 at 06:49

Yaroslav Bulatov

5,167
2
24
38

1

$n^n$ is such a crude approximation of $n!$ that one would be excused for doubting this approach. However, Stirling's (asymptotic) approximation $\log(n!) \sim n \log n - n + O(1)$ also leads to the desired result, at least for large $n$, because $p_1 + \cdots+ p_d = 1$. – whuber Jul 18 '11 at 13:20

score 2 · Answer 6 · 2010-07-20T17:36:23.330

2

The entropy is only one (as a concept) -- the amount of information needed to describe some system; there are only many its generalizations. Sample entropy is only some entropy-like descriptor used in heart rate analysis.

edited Jul 20 '10 at 17:36

answered Jul 20 '10 at 17:20

I know, however that doesn't help me to decide whether using sample entropy or shannon entropy or some other kind of entropy is appropriate for the data that I'm working with. – Christian Jul 20 '10 at 22:31
2

What I wrote in my post is just that for a certain type of data/process/system there is only one *true* entropy definition. Sample Entropy is *not* an entropy measure, it is just some statistic with a confusing name. Make a question where you define the data for which you want to calculate the entropy, and will get the formula. – Jul 20 '10 at 23:17
I'm not interested into *truth* but in getting a function that works. I'm a bioinformatician and taught not to seek dogmatic *truth* but to seek statistics that work. I don't think that there work done with the kind of data that I want to work with that specifics what entropy works best. That's kind of the point why I want to work with the data. – Christian Jul 27 '10 at 12:17
2

Right, but this is not a discussion about dogmatic truths but about words. You have asked about entropy, so I answered about entropy. Because now I see that you indeed need an answer about time series descriptors, write a question about time series descriptors, only then you'll get an useful answer. – Jul 27 '10 at 12:32

Good introduction into different kinds of entropy

6 Answers6

Linked