7

I am trying to solve for the parameters in the maximum entropy problem in R using the nonlinear system:

$\ \int_l^u e^{a+bx+cx^2}dx=1$

$\ \int_l^u x e^{a+bx+cx^2}dx=\mu$

$\ \int_l^u x^2 e^{a+bx+cx^2}dx=\mu^2+\sigma^2$

where $l$ and $u$ represent the lower and upper bound on the support of the distribution, and $a,b,c$ are the parameters of interest.

I wrote an implementation based on minimising the sum of the squared differences, and then optimising using BBoptim, but this method is very hit-or-miss, and depends heavily upon me inputting reasonable starting parameters, and quite tight lower and upper bounds on the range over which to search. I have included the code below:

MaxEnt <- function(y) {
G <- function(x) exp(y[1]+y[2]*x+y[3]*x^2) ;
H <- function(x) x*exp(y[1]+y[2]*x+y[3]*x^2) ;
I <- function(x) x^2*exp(y[1]+y[2]*x+y[3]*x^2) ;
(integrate(G, lower=l, upper=u)$value - 1)^2+(integrate(H, lower=l, upper=u)$value - mu)^2+(integrate(I, lower=l, upper=u)$value - mu^2-sd^2)^2}
BBoptim(c(-log(sqrt(2*pi)),0,0), MaxEnt, method=1)

Does anyone know of any better ways of doing what I am trying to achieve using R? Many thanks in advance.

============ EDIT JAN 17th ============

I have tried solving this problem using whuber's approach and so far this is what I get:

Let $\ y= \frac{x-l}{u-l}$. Then $\ dx = (u-l)dy$. Then since $\ x= (u-l)y+l$,

$\ a+bx+cx^2 = q+ry+cy^2$

where $\ q=a+bl+cl^2,r=(u-l)(b+2cl),s=c(u-l)^2$. The three original equations then become:

$\ \int_0^1 e^{q+ry+sy^2}dy=\frac{1}{u-l}$

$\ \int_0^1 ((u-l)y+l) e^{q+ry+sy^2}dy=\frac{\mu}{u-l}$

$\ \int_0^1 ((u-l)y+l)^2 e^{q+ry+sy^2}dy=\frac{\mu^2+\sigma^2}{u-l}$

whuber then suggests rewriting $\ e^{q+ry+sy^2}$. Letting $\ \nu = \frac{-r}{2s}$ and $\ \tau = \sqrt{\frac{-1}{2s}}$ we can see that $\ e^{q+ry+sy^2} = e^{q-\frac{r^2}{2s}}e^{-0.5\left(\frac{y-\nu}{\tau}\right)^2}$.

The problem I now have is that using whubers approach we must have that $\ \tau^2 >0 $ (?). This means I need $\ s<0$. By definition, $\ s <0$ only if $\ c <0$ also. However, I have an example where the density that has maximum entropy has $c>0$. I am probably going wrong somewhere, and would be grateful if somebody could point out where. Thanks!

mfrmn
  • 167
  • 2
  • 9
  • 2
    You can perform the integrals in closed form before setting up the optimization problem: this will make the search more accurate and much faster. The integration is easy when the pdf is written in the form $c(\nu, \tau)\exp(-(1/2)((x-\nu)/\tau)^2)$. The first equation determines $c$ explicitly, which further reduces the problem to finding only two parameters. – whuber Jan 16 '12 at 17:17
  • I can see how to convert the pdf into that form using $\nu=\frac{-b}{2c}$ and $\tau^2=\frac{-1}{2c}$, but I am not certain how to use this to solve for $\ c$, or indeed how this helps to make the optimisation problem faster. Thanks for your help. – mfrmn Jan 16 '12 at 20:19
  • I posted a working solution based on this idea 20 minutes ago. – whuber Jan 16 '12 at 20:22
  • mfrmn, I am eager to learn, can you explain a little bit where this problem comes from? What are you maximizing? (see also my comment below whuber answer) – Elvis Jan 16 '12 at 20:38
  • 2
    I am trying to maximise the entropy $\ = \int f(x) \ln f(x) dx$ subject to constraints on the support, first moment, and second moment. I could also expand this to other constraints (e.g. quantiles, higher order moments, the median) by adding additional equations to the nonlinear system and additional parameters above. Here, I know that the solution to my problem should be in the form $\ \exp{(a + bx+cx^2)}$, and am trying to solve for $\ a,b,c$. Just going through whuber's solution to check it works for me :) – mfrmn Jan 16 '12 at 20:52

1 Answers1

8

Making the change of variables $x = l + (u-l)y$ puts the problem into the same form with $u=1, l=0$. Rewrite the density in the form $f(x,\nu,\tau)=\exp(-1/2((x-\nu)\tau)^2)/(\sqrt{2\pi}\tau)$ and rewrite the equations as

$$\eqalign{ \int_0^1 x f(x,\nu,\tau) dx &= \mu \int_0^1 f(x,\nu,\tau)\\ \int_0^1 x^2 f(x,\nu,\tau)dx &= (\mu^2+\sigma^2) \int_0^1 f(x,\nu,\tau). }$$

All three integrals can be written explicitly in terms of the exponential exp and the cumulative standard normal distribution pnorm. This suggests the following R code for the core calculations; straightforward algebraic manipulations will produce the desired values of $a$, $b$, and $c$:

f0 <- function(x){ # Normalization factor
    mu<-x[1];sigma<-x[2]
    pnorm((1-mu)/sigma)-pnorm(-mu/sigma)
}
f1 <- function(x){ # First moment
    mu<-x[1];sigma<-x[2]
    (-exp(-0.5*((1-mu)/sigma)^2)+exp(-0.5*(mu/sigma)^2))*sigma*0.39894228040143268 + mu * f0(x)
}
f2 <- function(x){ # Second moment
    mu<-x[1];sigma<-x[2]
    (-(1+mu)*exp(-0.5*((1-mu)/sigma)^2)+mu*exp(-0.5*(mu/sigma)^2))*sigma*0.39894228040143268 + (mu^2+sigma^2) * f0(x)
}
f <- function(x,y){ # Discrepancy between first two moments determined by parameters x and target values (y)
    mu<-y[1];sigma<-y[2]
    c<-f0(x)
    (mu-f1(x)/c)^2 + (mu^2+sigma^2-f2(x)/c)^2
}
fit <- function(mu, sigma) {
    # Computes a truncated normal distribution on [0,1] with mean mu and SD sigma
    #     (the "max ent" solution--but see the comments). 
    # NB: naively takes (mu,sigma) as starting values.
    optim(c(mu,sigma), function(x){f(x,c(mu,sigma))})
}

As an example of running and checking this code, consider the problem with $\mu=0.2$, $\sigma=0.1$:

> mu <- .2
> sigma <- .1
> z <- fit(mu, sigma)
> z$par
[1] 0.1897255 0.1097969
> f1(z$par) / f0(z$par) # Should equal mu
0.2000000
> mu^2 + sigma^2
.05
> f2(z$par) / f0(z$par) # Should equal mu^2 + sigma^2
0.05000045
> nu <- z$par[1]
> tau <- z$par[2]
> gamma <- f0(z$par)
> curve(exp(-0.5*((x-nu)/tau)^2)/(tau*gamma) * 0.39894, 0, 1)

Plot of the fit

As required, this is a truncated normal distribution with the desired mean and variance. Shift and rescale it to arbitrary intervals $[l,u]$ as needed. This example was easy--the truncated distribution is close to normal anyway--but you will find that more extreme examples (such as $(\mu,\sigma)=(0.1,0.4)$) work just fine. (The speed is reasonable, too: on my system fit is taking 4 to 7 milliseconds.)

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 1
    (+1) Nice answer. Do you know why this is supposed to be a max entropy problem? I know that on the real line, the max entropy distribution with fixed mean and variance is normal, but is it known that on an interval it is truncated normal? I strongly doubt that this is true for the interval $[0,1]$ with $\mu = 0.5$ and $\sigma^2 = {1 \over 12}$. – Elvis Jan 16 '12 at 20:34
  • 1
    I have been wondering the same, Elvis. This is really a moment-matching estimator. For the MaxEnt solution, matching the moments is not an objective, it is a constraint. Subject to this constraint, one maximizes the entropy. The methods I suggest in this reply (rescaling to $[0,1]$ and symbolic computation of the integrals) can also be helpful for finding the MaxEnt solution. – whuber Jan 16 '12 at 23:03
  • 1
    Thanks for all your help. I have a robust optimisation problem where I am assuming that I have only been given the support, 1st and 2nd moment, and am looking to estimate the distribution which maximises the entropy subject to these constraints. The max entropy distribution will take the form $e^{a+bx+cx^2}$, so if I find the values of $a,b,c$ then this should be the corresponding function that maximises the entropy relative and meets the constraints. To follow up, I have been staring at this problem for ages, and can't for the life of me figure out how to rescale the initial $\ \mu, \sigma$. – mfrmn Jan 16 '12 at 23:24
  • @whuber I think I am missing something. With $\mu = 0.5$ and $\sigma = {1\over 12}$, the uniform distribution is a solution, and has –I think– maximal entropy. So the MaxEnt solution for these constraints is not a truncated gaussian! Where am I wrong? – Elvis Jan 16 '12 at 23:40
  • 2
    The uniform distribution has maximum entropy when the only information you know is the range. So in the example you give, if you know only that the distribution has support on [0,1] then the uniform distribution (which will by definition have $\ \mu=0.5, \sigma=\frac{1}{12}$) will have max entropy. But if you characterise the moments then the uniform distribution is no longer the MaxEnt solution. – mfrmn Jan 16 '12 at 23:46
  • @mfrmn I don’t get it. I am asking for the MaxEnt distribution with support $[0,1]$, mean $0.5$ and variance ${1\over 12}$. Is it the uniform distribution, or is it a truncated gaussian? – Elvis Jan 17 '12 at 11:32
  • 1
    Elvis, the solution for $\mu=1/2, \sigma=1/12$ is a normal distribution with mean close to $1/2$ and huge standard deviation: in other words, within the limits of its accuracy, this procedure is indeed obtaining a uniform distribution (as the limit of truncated Gaussians). – whuber Jan 17 '12 at 14:19
  • This makes sense, actually I was suspecting it. However when I try to run your code with these parameters, it turns out that the result is (approximately) the normal $\mathcal N\left( {1\over 2}, {1\over 12} \right)$, restricted to $[0,1]$. But thinking it well it seems clear that this is a local maximum. Or is it that this region of the parameter space is very flat? When you start the optimization with $\sigma = 0.1$, the parameters hardly move at all! – Elvis Jan 17 '12 at 14:39
  • 1
    @Elvis Hmm...I think you may have misunderstood the code, because when I run it, it gives a mean near 0.5 and a standard deviation of about 54 and the plot is practically level. The correct values of the parameters are 1/2 and sqrt(1/12), respectively (*not* 1/12). Note that `sigma` is a standard deviation, not a variance. – whuber Jan 17 '12 at 15:05
  • Oh. To my greatest shame, I understood the code perfectly, I just forgot to take the square root of the variance (in fact I made everyone write $\sigma = {1\over 12}$ instead of $\sigma^2$!). I apologize for this long series of questions... – Elvis Jan 17 '12 at 15:23
  • @whuber If you get a chance, I made an edit to my original question to try and clarify a part where I am struggling to understand your method. It seems to me you are adding an additional restriction $\ c < 0$ with this approach, but I am not 100% sure that I am correct. Many thanks! – mfrmn Jan 17 '12 at 15:25
  • 2
    You are right, I did implicitly make that assumption. But the mathematics does not change if you choose a purely imaginary value of sigma :-). I don't think R can handle that, which means you might want to rewrite the functions `f0`, `f1`, and `f2` using a more general parameterization allowing $c\lt 0$. The approach will still work. Simply observe that the fundamental parameters are $b$ and $c$ (instead of $\mu$ and $\sigma$) and that $a$ is a normalizing constant, computed via `f0`. – whuber Jan 17 '12 at 15:31