4

The max entropy philosophy states that given some constraints on the prior, we should choose the prior that is maximum entropy subject to those constraints.

I know that the Beta($\alpha, \beta$) is the max entropy distribution with domain $[0,1]$ subject to the constraints that $\mathbb{E}[ \ln(x) ] = \psi(\alpha) - \psi(\alpha + \beta)$ and $\mathbb{E}[ \ln(1 - x) ] = \psi (\beta) - \psi( \alpha + \beta)$. (Where $\psi$ is the digamma function.) (Reference: https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution#Other_examples )

My question is: Why might these constraints be reasonable descriptions of state of knowledge about the world?

The constraint on the domain being $[0,1]$ is clear from the description as a prior on a binomial or geometric parameter; is there some way to interpret the other two constraints from that angle?

Thoughts: By the law of large numbers, $\mathbb{E}[ \ln(x)]$ is approximated by $\ln ( \sqrt[n]{ \prod_{i = 1}^n X_i })$, where the $X_i$ are iid Beta($\alpha, \beta$); so constraining $\mathbb{E}[ \ln(x) ]$ is like saying that we have knowledge about the arithmetic mean of a large sample. (And similarly for the other constraint.)

However, it is unclear to my why this is a natural piece of knowledge to have -- unlike, say the max entropy justification for the Gaussian where it seems natural to have prior beliefs about the mean and variance. Perhaps this bias towards accepting that mean and variance constraints are natural is just due to exposure to certain kinds of datasets / generative models over other ones? What would be a good example to accept the naturalness of the constraints that make the beta distribution max entropy?

Elle Najt
  • 211
  • 1
  • 7
  • 4
    A very suggestive mathematical reason is indicated at the end of my answer at https://stats.stackexchange.com/a/185709/919: the natural measure associated with Beta distributions is $$\frac{\mathrm{d}x}{x(1-x)}=\mathrm{d}\left(\log(x) - \log(1-x)\right).$$ – whuber Apr 30 '20 at 19:11
  • 3
    @whuber: Haar, haar! – Xi'an Apr 30 '20 at 20:00
  • @whuber Thanks! I see a vague connection, just on the level of symbols that are appearing, but am not really sure why it answers the question. Also -- I see how that's a convenient measure for describing the beta distributions, but why is that *the* natural measure associated to the Betas? Is there some natural group structure I'm missing?( You can identify $(0,1)$ with $(\mathbb{R}, + , 0)$, e.g. with tan, but I couldn't get that to help.) – Elle Najt Apr 30 '20 at 20:25
  • 2
    Comments aren't answers! Regardless, you can write $x=1-1/(1+\exp(y))$ for a unique real number $y.$ (This is the logit transformation.) The relevant group acting on $y$ looks like the additive real numbers :-). – whuber May 01 '20 at 17:20
  • @whuber Thanks -- I guess it's not a coincidence that the logit transformation gives a (the?) canonical link for the Bernoulli distribution, for which the beta is a conjugate prior... is that the sense in which is the *the* natural measure, or do you have another one in mind? – Elle Najt May 02 '20 at 19:22

0 Answers0