What is the interpretation of negative differential entropy?

Question

I'm currently trying to understand the meaning of negative entropy. With the given equation: $$ H_{cont}\ (X) = -\int_{-\infty}^{+\infty} p(x) \centerdot ln(p(x)) \ dx $$

Here, the The_Sympathizer explains the idea of differential entropy well and I “begin” to understand when it can be negative (Still in the learning process ;) ). For example I have a gaussian distributed variable X, $\mu=0$ (not so important to mention it in this case, I guess, but nvm) and $\sigma=3$. We integrate the equation above, knowing the PDF p(x) of a gaussian variable and “simplify” it to: $$ H_{cont}\ (X) = ln(\sigma \sqrt{2 \pi e}) $$ So we’ll get an entropy of $H_{\sigma=3}\ (X) = 2.52 nats$ Now we’ll look at a more narrow distribution with $\sigma=0.03$. The entropy for this example will be: $H_{\sigma=0.03}\ (X) = -2.09 nats$. This distribution is “really sharp”, there’s not much “variation” in it -> so the second distribution will not provide so “much information” compared to the first distribution (because the results in the second distribution isn’t really surprising; less uncertainty; I’ll gain not so much information by looking only at this one variable X). So far, so right? But what does -2.09 nats really mean? Can I really make any conclusion if compare it to another, smaller deviation? Can I drop columns with really negative differential entropy cause they don’t provide so much information?(I guess it's not so simple)

So as a mechanical engineer, I need a more practical example. (I started to learn more about reinforcement learning) I’m not sure if this example is really applicable on differential entropy but I try it: I have two pilots who are flying a helicopter. I’m not accounting any environmental influences and it’s only a 2D example. The Target is to fly a straight line and land the helicopter at the coordinates (0,0). Starting point of the helicopter is (0,10), i.e. 10 meters above the surface.

Looking at the picture I draw. Pilot A on the left isn’t the best one according to his flying skills. Pilot B is a good pilot and hugging the y-axis. Applying the idea of a “narrow or wider distribution” and comparing Pilot A and B: The flight profile of Pilot A has many different possible “states”(a wider density function) and will lead to a bigger differential entropy H(x) and for Pilot B it’s probably small or maybe a negative entropy, right? So comparing this two entropies, what can I tell? Can I only say Pilot B can fly straight down a line without many adjustings and Pilot A needs much more practice? And the best pilot will have $-\infty$ entropy? Is it my decision to say at which entropy threshold a pilot is good enough for my task? Does it really make sense to look only at one variable in the continuous space? (is it better to look for Mutual Information, crossentropy, transferentropy etc.?) There are probably much better measurements to rate the performance of the pilot skills but I want to understand the usage of the differential entropy, maybe you guys have a better example? I hope my question is understandable...

thank you!

Negative values are meaningless: differential entropies are meant to be *compared* to each other, that's all. For instance, you could just change the units of measurement (thereby rescaling the distribution) to make the entropy as negative as you please: see https://stats.stackexchange.com/questions/415435. — whuber, Feb 21 '22 at 14:23
@whuber Thank you so much for your fast answer. I hope I got the scaling part right, thanks! So in my example with the pilots, like you said, I only can compare the the differential entropy of the two pilots with each other(same scaling) and (only) can say that Pilot A has a higher Entropy-> a more spread out "flight line"/distribution than Pilot B (referring to your comment: "This supports the intuition that high-entropy distributions are "more spread out" than low-entropy distributions."), hopefully it get this right? — RL Rookie, Feb 21 '22 at 15:50
It seems unlikely that entropy could be a meaningful way to measure how "spread out" a flight path might be. Spread of a path, after all, is naturally measured in *distance,* but entropy is not. — whuber, Feb 21 '22 at 15:57

What is the interpretation of negative differential entropy?

0 Answers0