I'm looking for a book or online resource that explains different kinds of entropy such as Sample Entropy and Shannon Entropy and their advantages and disadvantages. Can someone point me in the right direction?
6 Answers
Cover and Thomas's book Elements of Information Theory is a good source on entropy and its applications, although I don't know that it addresses exactly the issues you have in mind.

- 11,849
- 9
- 56
- 85

- 2,916
- 3
- 19
- 18
-
4Also the paper "Information Theoretic Inequalities" by Dembo Cover and Thomas reveals a lot of deep aspects – robin girard Aug 09 '10 at 13:36
-
1Still, none of those books claim that there is more than one entropy. – Sep 02 '10 at 10:25
These lecture notes on information theory by O. Johnson contain a good introduction to different kinds of entropy.

- 539
- 3
- 5
If your interested in the mathematical statistic around entropy, you may consult this book
http://www.renyi.hu/~csiszar/Publications/Information_Theory_and_Statistics:_A_Tutorial.pdf
it is freely available !

- 6,335
- 6
- 46
- 60
Grünwald and Dawid's paper Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory discuss generalisations of the traditional notion of entropy. Given a loss, its associated entropy function is the mapping from a distribution to the minimal achievable expected loss for that distribution. The usual entropy function is the generalised entropy associated with the log loss. Other choices of losses yield different entropy such as the Rényi entropy.

- 131
- 3
-
1So then, sigma is the entropy of N(0,sigma) corresponding to squared error, and min(p,1-p) is the entropy of Bernoulli(p) corresponding to 0,1 prediction loss? Seems like quite a generalization! – Yaroslav Bulatov Sep 06 '10 at 00:03
-
Yes. The entropy for square loss is constant and the entropy for 0-1 loss is min(p,1-p). What's also interesting is that these have strong correspondences to divergences too. The square loss to the Hellinger divergence and 0-1 loss to variational divergence. Since entropies defined like this they are necessarily concave functions and it turns out the f-divergence built using f(p) = -entropy(p). Bob Williamson and I have explored some of this in our paper: http://arxiv.org/abs/0901.0356 . It's fun stuff. – Mark Reid Sep 12 '10 at 05:39
-
1Here's something interesting I found about divergences recently -- each step of Belief Propagation can be viewed as a Bregman Projection http://www.ece.drexel.edu/walsh/Walsh_TIT_10.pdf – Yaroslav Bulatov Oct 02 '10 at 17:05
Jaynes shows how to derive Shannon's entropy from basic principles in his book.
One idea is that if you approximate $n!$ by $n^n$, entropy is the rewriting of the following quantity $$\frac{1}{n}\log \frac{n!}{(n p_1)!\cdots (n p_d)!}$$
The quantity inside the log is the number of different length n observation sequences over $d$ outcomes that are matched by distribution $p$, so it's a kind of a measure of explanatory power of the distribution.

- 5,167
- 2
- 24
- 38
-
1$n^n$ is such a crude approximation of $n!$ that one would be excused for doubting this approach. However, Stirling's (asymptotic) approximation $\log(n!) \sim n \log n - n + O(1)$ also leads to the desired result, at least for large $n$, because $p_1 + \cdots+ p_d = 1$. – whuber Jul 18 '11 at 13:20
The entropy is only one (as a concept) -- the amount of information needed to describe some system; there are only many its generalizations. Sample entropy is only some entropy-like descriptor used in heart rate analysis.
-
I know, however that doesn't help me to decide whether using sample entropy or shannon entropy or some other kind of entropy is appropriate for the data that I'm working with. – Christian Jul 20 '10 at 22:31
-
2What I wrote in my post is just that for a certain type of data/process/system there is only one *true* entropy definition. Sample Entropy is *not* an entropy measure, it is just some statistic with a confusing name. Make a question where you define the data for which you want to calculate the entropy, and will get the formula. – Jul 20 '10 at 23:17
-
I'm not interested into *truth* but in getting a function that works. I'm a bioinformatician and taught not to seek dogmatic *truth* but to seek statistics that work. I don't think that there work done with the kind of data that I want to work with that specifics what entropy works best. That's kind of the point why I want to work with the data. – Christian Jul 27 '10 at 12:17
-
2Right, but this is not a discussion about dogmatic truths but about words. You have asked about entropy, so I answered about entropy. Because now I see that you indeed need an answer about time series descriptors, write a question about time series descriptors, only then you'll get an useful answer. – Jul 27 '10 at 12:32