9

Short question: Why is this true??

Long question:

Very simply, I am trying to figure out what justifies this first equation. The author of the book I am reading, (context here if you want it, but not necessary), claims the following:

Due to the assumption of near-gaussianity, we can write:

$$ p_0(\xi) = A \; \phi(\xi) \; exp( a_{n+1}\xi + (a_{n+2} + \frac{1}{2})\xi^2 + \sum_{i=1}^{n} a_i G_i(\xi)) $$

Where $p_0(\xi)$ is the PDF of your observed data that has maximum entropy, given that you had only observed a series of expectations, (simple numbers) $c_i, i = 1 ... n$, where $c_i = \mathbb{E}\{G_i(\xi)\}$, and $\phi(\xi)$ is the PDF of a standardized gaussian variable, that is, 0 mean, and unit variance.

Where all this is going is that he uses the above equation as a starting point for making the PDF, $p_0(\xi)$ simpler, and I get how he does it, but I do not get how he justifies the above equation, ie, the starting point.

I have tried to keep it brief to as not to obfuscate anyone, but if you want additional details please let me know in the comments. Thanks!

Spacey
  • 1,639
  • 2
  • 13
  • 18

1 Answers1

12

(Note: I've changed your $\xi$ to $x$.)

For a random variable $X$ with density $p$, if you have constraints $$ \int G_i(x)\,p(x)\,dx=c_i \, , $$ for $i=1,\dots,n$, the maximum entropy density is $$ p_0(x)=A\exp\left(\sum_{i=1}^n a_iG_i(x)\right) \, , $$ where the $a_i$'s are determined from the $c_i$'s, and $A$ is a normalization constant.

In this context, the Gaussian approximation ("near-gaussianity") means two things:

1) You accept to introduce two new constraints: the mean of $X$ is $0$ and the variance is $1$ (say);

2) The corresponding $a_{n+2}$ (see bellow) is much bigger than the other $a_i$'s.

These additional constraints are represented as $$ G_{n+1}(x)=x \, , \qquad c_{n+1}=0 \, , $$ $$ G_{n+2}(x)=x^2 \, , \qquad c_{n+2}=1 \, , $$ yielding $$ p_0(x)=A\exp\left(a_{n+2}x^2 + a_{n+1}x + \sum_{i=1}^n a_iG_i(x)\right) \, , $$ which can be rewritten as (just "add zero" to the exponent) $$ p_0(x)=A\exp\left(\frac{x^2}{2} - \frac{x^2}{2} + a_{n+2}x^2 + a_{n+1}x + \sum_{i=1}^n a_iG_i(x)\right) \, , $$ leading to what you want: $$ p_0(x)=A'\,\phi(x)\exp\left(a_{n+1}x + \left(a_{n+2}+\frac{1}{2}\right)x^2 + \sum_{i=1}^n a_iG_i(x)\right) \, ; $$ ready to be Taylor expanded (using the second condition of the Gaussian approximation).

Doing the approximation like a Physicist (which means that we don't care about the order of the error term), using $\exp(t)\approx 1+t$, we have the approximate density $$ p_0(x) \approx A'\,\phi(x)\left(1+a_{n+1}x + \left(a_{n+2}+\frac{1}{2}\right)x^2 + \sum_{i=1}^n a_iG_i(x)\right) \, . $$ To finish, we have to determine $A'$ and the values of the $a_i$'s. This is done imposing the conditions $$ \int p_0(x)\,dx=1 \, , \qquad \int x \,p_0(x)\,dx=0 \, , \qquad \int x^2 \,p_0(x)\,dx=1 $$ $$ \int G_i(x)\, p_0(x)\,dx=c_i \, , \quad i=1,\dots,n \, , $$ to obtain a system of equations, whose solution gives $A'$ and the $a_i$'s.

Without imposing additional conditions on the $G_i$'s, I don't believe that there is a simple solution in closed form.

P.S. Mohammad clarified during a chat that with additional orthogonality conditions for the $G_i$'s we can solve the system.

Zen
  • 21,786
  • 3
  • 72
  • 114
  • Zen, thanks very much. I (somewhat) understand now. What is not clear to me though, is when you say _"In this context, the Gaussian approximation ("near-gaussianity") means that you accept to introduce two new constraints: that the mean of X is 0 and the variance is (say) 1."_ , I dont understand, why for something to be 'near gaussian', means for it to have $\mu=0$ and $\sigma^2=1$. What if it was just another r.v. that happened to have those same values? – Spacey Aug 29 '12 at 14:39
  • Hi Mohammad. I've added more information to the answer. To get the former expression of $p_0(x)$ you use only what I've called the first condition of the Gaussian approximation. You will use the second condition when you do the Taylor expansion of this $p_0(x)$. I hope this helps. – Zen Aug 29 '12 at 19:42
  • Would you mind posting as a comment the final expression for the $p_0(x)$ after you do the remaining computations? Thanks. – Zen Aug 29 '12 at 20:07
  • yes, he is saying that the final expression is: $p_0(z) \approx \phi(z) (1 + \sum_{i=1}^{N} c_i F_i(z))$ – Spacey Aug 29 '12 at 20:11
  • I think there is a typo in the last equation?... $a_{n+1}x$ is happening twice?... – Spacey Aug 29 '12 at 20:20
  • Typo corrected! – Zen Aug 29 '12 at 22:28
  • Thank you. I guess then the question becomes, why are we allowed to make the assumption that anything can be near gaussian? – Spacey Aug 29 '12 at 22:35
  • Consider $\textbf{all}$ random variables with support on the real line that have $0$ mean and unity variance. We have seen that among those random variables, the one with density $p_0(x)=A\exp(a_2x^2+a_1x)$ has largest entropy, and this is clearly a Gaussian density (complete the square on the exponent if you want). – Zen Aug 29 '12 at 22:54
  • That is why the Gaussian approximation makes sense: if you are looking for the maximum entropy density in a situation where you have additional constraints, it is reasonable to suppose that the maximum entropy density will be "close" to the Gaussian one described above. Does that make sense to you? – Zen Aug 29 '12 at 22:58
  • Yes... I see it... I think. – Spacey Aug 29 '12 at 23:03
  • I don't see how can you get your "final expression" in your comment above without some additional conditions imposed on the $G_i$'s. I can't... – Zen Aug 29 '12 at 23:17
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/4673/discussion-between-mohammad-and-zen) – Spacey Aug 29 '12 at 23:23