2

I have a random process that follows a normal distribution. It's parameters are mean = 35 units, std.dev. = 8 units. I've seen from the wiki entry for the normal distribution that there is a formula to calculate the entropy. So plugging in the figures as:- $$ .5\log\left(2\pi e^1 8\cdot 8\right) $$ I get a value of 1.52, which I take to be per sample. My question is what are these units? What thing do I have 1.52 of?

Information entropy is (typically) measured in units of bits, after Claude Shannon's definition. So can I take it that each sample generates 1.52 bits of entropy? Clearly recording those samples generates information and therefore occupies a real and discrete amount of storage space. Ergo entropy cannot be unit less.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Paul Uszak
  • 494
  • 3
  • 17

2 Answers2

1

@Aksakal gave the answer in comments, entropy is unitless. Still, there are some details to elaborate. First, sometimes units are given as "nats" or "bits", for the cases of use of natural logs/binary logs, respectively. But these are not really measurements units (like meter, kg, ...) in the physical sense, it corresponds more to writing the measurement in decimal or binary number systems. A measurement of length in meters can be written in decimal or binary, that does not really change the unit of measurement of length used. The unit is meter in both cases.

Some details: We treat Shannon (discrete) and differential (continuous) entropy separately. $$ \DeclareMathOperator{\E}{\mathbb{E}} H(X) = -\sum_x p(x) \log p(x) = -\E_X \log p(X) $$ where $p$ is the probability mass function of a discrete random variable. Then $$ H_d(X) = -\int f(x) \log f(x) \; dx = -\E_X \log f(X) $$ where $f$ is the probability density function of a continuous random variable. Now, from general principles the unit of measurement of the expectation (mean, average) of a variable (random or not) is the same as the unit of measurement of the variable itself. This leaves us with the unit of measurement of $\log p(x), \log f(x)$ respectively. Again, from general principles (see lognormal distribution, standard-deviation and (physical) units for discussion and references) the arguments of transcendental functions like $\log$ must be unitless. That rises a problem, while $p(x)$ certainly is unitless, since probability is an absolute number, the density $f(x)$ measures probability pr unit of $x$, so if unit of $x$ is $\text{u}$, then unit of $f(x)$ is $\text{u}^{-1}$. So, for the equation defining differential entropy $H_d$ to be dimensionally correct, we must assume the argument to log contains a "hidden" constant with numerical value 1 and unit $\text{u}$. But the conclusion follows, that both Shannon and differential entropy is unitless. Still, one must remember that differential entropy scales with the unit of measurement of $X$, as discussed in https://en.wikipedia.org/wiki/Differential_entropy

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    But Shannon entropy is measured in bits, and absolutely not unit less. I'm not sure where you get your idea of what it is, but Shannon (who invented information entropy) says it's measured in bits as he wrote in his _A Mathematical Theory of Communication_ paper. – Paul Uszak Oct 09 '17 at 20:40
  • Did you read my answer? Bits and nats is mentioned, as "units", but they are really only a description of the base of logs used. You can compare it with "radians", used as name of a unit , angle measure, but as angle measure is a quotient of two length, is really unitless. Bits, nats, like radians, is not a true unit like meter is – kjetil b halvorsen Oct 09 '17 at 21:08
  • Here http://mathforum.org/library/drmath/view/64034.html is a link to a good discussion of the "parallell" issue of the unit of measurement of angles, as radians. – kjetil b halvorsen Oct 12 '17 at 15:14
  • 1
    Paul is absolutely right: the proof that entropy does have a unit lies in how it transforms when the base of the logarithm is changed. The logarithm of the base is the scale of the unit. Trying to dismiss that as "merely a description" misses the fact that this transformation rule forcibly demonstrates the entropy cannot be unitless. – whuber Sep 30 '20 at 18:21
1

Shannon entropy is normally given "units" of bits or nats in information theory. Information theory includes the measurable concept of compression. Define a compression ratio as (ADC sample size) / (Shannon entropy of sample set). The numerator and denominator would both be described as "number of bits". The Shannon entropy of the sample set gives the smallest average number of bits per sample which could be achieved by entropy coding the sample set, such as using Huffman's approach. This context justifies applying the term "bits" to Shannon entropy. Note that the term entropy used in thermodynamics should not be confused with Shannon entropy used in information theory.

Bruce
  • 26
  • 1