Questions tagged [entropy]

A mathematical quantity designed to measure the amount of randomness of a random variable.

Entropy is a mathematical quantity designed to quantify the uncertainty about the occurrence of outcomes of a random variable. It is expressed as a function of the outcome probabilities of the random variable. Any measure for entropy must satisfy a few conditions:

  1. Continuity: The function must be continuous in all its arguments.
  2. Maximum: The function should have a maximum when all outcome are equally probable.
  3. Symmetry: The function must remain unchanged under a switch of arguments.

A commonly adopted measure is the Shannon entropy, $\mathrm{H}(p_1,p_2,\ldots,p_n)$ (when $p_1,p_2,\ldots p_n$ are the $n$ outcome probabilities of a random variable $X$). This measure is defined as follows:

$$\mathrm{H}(p_1,p_2,\ldots,p_n) = -\sum_{i=1}^n p_i \log p_i$$

627 questions
106
votes
17 answers

What is the role of the logarithm in Shannon's entropy?

Shannon's entropy is the negative of the sum of the probabilities of each outcome multiplied by the logarithm of probabilities for each outcome. What purpose does the logarithm serve in this equation? An intuitive or visual answer (as opposed to a…
75
votes
4 answers

What is the difference Cross-entropy and KL divergence?

Both the cross-entropy and the KL divergence are tools to measure the distance between two probability distributions, but what is the difference between them? $$ H(P,Q) = -\sum_x P(x)\log Q(x) $$ $$ KL(P | Q) = \sum_{x} P(x)\log {\frac{P(x)}{Q(x)}}…
yoyo
  • 979
  • 1
  • 6
  • 9
59
votes
10 answers

Measuring entropy/ information/ patterns of a 2d binary matrix

I want to measure the entropy/ information density/ pattern-likeness of a two-dimensional binary matrix. Let me show some pictures for clarification: This display should have a rather high entropy: A) This should have medium entropy: B) These…
42
votes
4 answers

Entropy of an image

What is the most information/physics-theoretical correct way to compute the entropy of an image? I don't care about computational efficiency right now - I want it theoretically as correct as possible. Lets start with a gray-scale image. One…
Davor Josipovic
  • 948
  • 1
  • 12
  • 19
41
votes
8 answers

Why is Entropy maximised when the probability distribution is uniform?

I know that entropy is the measure of randomness of a process/variable and it can be defined as follows. for a random variable $X \in$ set $A$ :- $H(X)= \sum_{x_i \in A} -p(x_i) \log (p(x_i)) $. In the book on Entropy and Information Theory by…
user76170
  • 639
  • 2
  • 8
  • 9
39
votes
3 answers

What does entropy tell us?

I am reading about entropy and am having a hard time conceptualizing what it means in the continuous case. The wiki page states the following: The probability distribution of the events, coupled with the information amount of every event, forms…
RustyStatistician
  • 1,709
  • 3
  • 13
  • 35
38
votes
3 answers

What does the Akaike Information Criterion (AIC) score of a model mean?

I have seen some questions here about what it means in layman terms, but these are too layman for for my purpose here. I am trying to mathematically understand what does the AIC score mean. But at the same time, I don't want a rigor proof that…
caveman
  • 2,431
  • 1
  • 16
  • 32
31
votes
3 answers

Entropy-based refutation of Shalizi's Bayesian backward arrow of time paradox?

In this paper, the talented researcher Cosma Shalizi argues that to fully accept a subjective Bayesian view, one must also accept an unphysical result that the arrow of time (given by the flow of entropy) should actually go backwards. This is mainly…
ely
  • 2,272
  • 18
  • 31
30
votes
4 answers

Statistical interpretation of Maximum Entropy Distribution

I have used the principle of maximum entropy to justify the use of several distributions in various settings; however, I have yet to be able to formulate a statistical, as opposed to information-theoretic, interpretation of maximum entropy. In other…
30
votes
2 answers

Who is Gail Gasram?

Several places (1, 2, 3) quote someone named Gail Gasram as saying "Nothing is random, only uncertain" but a Google search turns up no info, just more places with this quote! Generally, it's in the context of random number generation, such as the…
JeffThompson
  • 467
  • 5
  • 8
26
votes
4 answers

Kullback-Leibler divergence WITHOUT information theory

After much trawling of Cross Validated, I still don't feel like I'm any closer to understanding KL divergence outside of the realm of information theory. It's rather odd as somebody with a Math background to find it much easier to understand the…
23
votes
1 answer

How does entropy depend on location and scale?

The entropy of a continuous distribution with density function $f$ is defined to be the negative of the expectation of $\log(f),$ and therefore equals $$H_f = -\int_{-\infty}^{\infty} \log(f(x)) f(x)\mathrm{d}x.$$ We also say that any random…
whuber
  • 281,159
  • 54
  • 637
  • 1,101
23
votes
2 answers

What is empirical entropy?

In the definition of jointly typical sets (in "Elements of Information Theory", ch. 7.6, p. 195), we use $$-\frac{1}{n} \log{p(x^n)}$$ as the empirical entropy of an $n$-sequence with $p(x^n) = \prod_{i=1}^{n}{p(x_i)}$. I never came across this…
blubb
  • 2,458
  • 2
  • 19
  • 28
21
votes
3 answers

Comparison between MaxEnt, ML, Bayes and other kind of statistical inference methods

I'm in no way a statistician (I've had a course in mathematical statistics but nothing more than that), and recently, while studying information theory and statistical mechanics, I met this thing called "uncertainty measure"/"entropy". I read…
Francesco
  • 313
  • 3
  • 6
21
votes
5 answers

Typical set concept

I thought that the concept of typical set was pretty intuitive: a sequence of length $n$ would belong to the typical set $A_\epsilon ^{(n)}$ if the probability of the sequence coming out was high. So, any sequence that was likely would be in…
Tendero
  • 740
  • 7
  • 20
1
2 3
41 42