Short version: How can the joint entropy of two independent variables be less than the sum of those independent variables? The joint entropy should encode all information that a scalar function can, right?
Long version: Assume there are 2 independent normal random variables $X, Y$ both with mean $0$ and variance $\sigma^2$.
- We know that entropy of $X$ and $Y$ is $H(X)= H(Y)= \ln(2\pi e \sigma^2)/2$ (derivation)
- The variance of the random variable $SUM = X + Y$ is $2 \sigma^2$
- 1 and 2 mean that $H(SUM)= \ln(2 \pi e (2 \sigma^2))/2$
- The sum of the entropies of 2 independent random variables is the entropy of their joint distribution, i.e. $H(X, Y) = H(X) + H(Y)$ . This implies that in this particular case $$H(X, Y) = (\ln(2\pi e \sigma^2)/2) \cdot 2.$$
- Now note that if $\sigma^2=(\pi e)^{-1}$, then from 3 and 4 $$H(X,Y)= (\ln(2)/2) \cdot 2 = H(SUM)= ln(2 \cdot 2)/2 = \ln(2).$$
And if you increase $\sigma$ then $H(SUM) > H(X,Y)$. It seems quite fantastic that $(\pi e)^{-1}$ is an entropy tipping point for Gaussians. Do you know of any papers or books that make this observation? Is $N(0, (\pi e)^{-1})$ discussed as an alternative to $N(0,1)$ because of its neutrality in this context? And why is this happening at all? Shouldn't joint entropy be greater than entropy of any scalar function, it seems to be more general than any scalar function?