5

I am looking to Shannon index formula in diversity. Part of the formula I am having trouble following. For example, 50 foxes at site 1, 60 foxes site 2 and 100 foxes site 3. Across all sites there are 210 foxes.

50 / 210 = 0.23809. Then get that log (0.23809)

60/ 210 = 0.28571. Log (0.28571)

100/210 = 0.47619. Log (0.47619)

But the formula goes on to multiply them together: $p\cdot \log(p)$, viz., $0.23809\times \log(0.23809)$, and so on for the others. It adds up the total for each together. Using the formula as context, I want to know what $p\cdot \log(p)$ does in statistics? That is, why multiply the number, e.g., 0.23809 by the log of the number? It’s not the formula that’s the problem - it’s multiplying the the number by its log. Is that a usual thing in logs? What is the aim / reason of it? IF I were to multiply it by 100/1 I would get the proposition. But why multiply a number by the log of the same number?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
cara
  • 51
  • 4
  • We need more context. – Tim Feb 06 '18 at 16:46
  • It sounds like you are trying to describe the (negative of the) *entropy* of a discrete distribution. See the [Wikipedia article on diversity indexes](https://en.wikipedia.org/wiki/Diversity_index), for instance. – whuber Feb 06 '18 at 17:16
  • Yes that’s it, multiple pi x log(pi) – cara Feb 06 '18 at 17:20
  • 3
    You are asking why entropy is defined the way it is. See for for one possible explanation: https://stats.stackexchange.com/questions/66186/statistical-interpretation-of-maximum-entropy-distribution/245198#245198 – kjetil b halvorsen Feb 07 '18 at 12:37
  • 4
    Shannon's paper on the topic is very readable. I'd recommend starting there. http://math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf – Sycorax Feb 07 '18 at 14:15

1 Answers1

2

Your question describes Shannon entropy. It originates in C.E. Shannon, "A Mathematical Theory of Communication" The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, July, October, 1948.

Don't be turned off by the date; because it was written in a time when clear communication, instead of technical obscurantism, was valued in publications, the paper is quite readable.

Sycorax
  • 76,417
  • 20
  • 189
  • 313