Deriving Negentropy. Getting stuck

Question

So, this question is somewhat involved but I have painstakingly tried to make it as straight-forward as possible.

Goal: Long story short, there is a derivation of negentropy that does not involve higher order cumulants, and I am trying to understand how it was derived.

Background: (I understand all this)

I am self-studying the book 'Independent Component Analysis', found here. (This question is from section 5.6, in case you have the book - 'Approximation of Entropy by Nonpolynomial Functions').

We have $x$, which is a random variable, and whose negentropy we wish to estimate, from some observations we have. The PDF of $x$ is given by $p_x(\zeta)$. Negentropy is simply the difference between the differential entropy of a standardized Gaussian random variable, and the differential entropy of $x$. The differential entropy here is given by $H$, such that:

$$ H(x) = -\int_{-\infty}^{\infty} p_x(\zeta) \: log(p_x(\zeta)) \: d\zeta $$

and so, the negentropy is given by

$$J(x) = H(v) - H(x)$$

where $v$ is a standardized Gaussian r.v, with PDF given by $\phi(\zeta)$.

Now, as part of this new method, my book has derived an estimate of the PDF of $x$, given by:

$$ p_x(\zeta) = \phi(\zeta) [1 + \sum_{i} c_i \; F^{i}(\zeta)] $$

(Where $c_i = \mathbb{E}\{F^i(x)\}$. By the way, $i$ is not a power, but an index instead).

For now, I 'accept' this new PDF formula, and will ask about it another day. This is not my main issue. What he does now though, is plug this version of the PDF of $x$ back into the negentropy equation, and ends up with:

$$ J(x) \approx \frac{1}{2}\sum_i\mathbb{E} \{F^i(x)\}^2 $$

Bear in mind, the sigma (here and for the rest of the post), just loops around the index $i$. For example, if we only had two functions, the signal would loop for $i=2$ and $i=2$. Of course, I should tell you about those functions he is using. So apparently, those functions $F^i$ are defined as thus:

The functions $F^i$ are not polynomial functions in this case. (We assume that the r.v. $x$ is zero mean, and of unit variance). Now, let us make some constraints and give properties of those functions:

$$ F^{n+1}(\zeta) = \zeta, \: \: c_{n+1} = 0 $$

$$ F^{n+2}(\zeta) = \zeta^2, \: \: c_{n+1} = 1 $$

To simplify calculations, let us make another, purely technical assumption: The functions $F^i, i = 1, ... n$, form an orthonormal system, as such:

$$ \int \phi(\zeta) F^i(\zeta)F^j(\zeta)d\zeta= \begin{cases} 1, \quad \text{if } i = j \\ 0, \quad \text{if } i \neq j \end{cases} $$

and

$$ \int \phi(\zeta)F^i(\zeta)\zeta^k d(\zeta) = 0, \quad \text{for } k = 0,1,2 $$

Almost there! OK, so all that was the background, and now for the question. The task is to then, simply place this new PDF into the differential entropy formula, $H(x)$. If I understand this, I will understand the rest. Now, the book gives the derivation, (and I agree with it), but I get stuck towards the end, because I do not know/see how it is cancelling out. Also, I do not know how to interpret the small-o notation from the Taylor expansion.

This is the result:

Using the Taylor expansion $(1+\epsilon)log(1+\epsilon) = \epsilon + \frac{\epsilon^2}{2} + o(\epsilon^2)$, for $H(x)$ we get:

$$ H(x) = -\int \phi(\zeta) \; (1 + \sum c_i F^i(\zeta)) \; (log(1 + \sum c_i F^i(\zeta) + log(\zeta)) \; d(\zeta) \\ = -\int \phi(\zeta) log(\zeta) -\int \phi(\zeta) \sum c_i F^i(\zeta) log(\phi(\zeta)) -\int \phi(\zeta) \; [\sum c_i F^i(\zeta) + \frac{1}{2}(\sum c_i F^i(\zeta))^2 + o((\sum c_i F^i(\zeta))^2)] $$

and so

The Question: (I don't understand this) $$ H(x) = H(v) - 0 - 0 -\frac{1}{2}\sum c_i^2 + o((\sum c_i)^2 $$

So, my problem: Except for the $H(v)$, I don't understand how he got the final 4 terms in the last equation. (i.e., the 0, the 0, and the last 2 terms). I understand everything before that. He says he has exploited the orthogonality relationships given in the properties above, but I don't see how. (I also don't understand the small-o notation here, in the sense of, how it is used?)

THANKS!!!!

EDIT:

I have gone ahead and added the images from the book I am reading, it pretty much says what I said above, but just in case someone needs additional context.

enter image description here

And here, marked in red, is the exact part that is confusing me. How does he use the orthogonality properties to get that last part, where things are cancelling out, and the final summations involving $c_i^2$, and the small-o notation summation?

**Hint**: Write out explicitly $\log \phi(x)$ and use the author's stated assumptions to get the zeros for the two middle terms. There must be several typos including in the block quote; e.g., the $\neq$ appears in the wrong place in the orthonormal basis definition you give. — cardinal, Aug 28 '12 at 15:00
@cardinal Ok, corrected the typo, thanks. That being said, I am not clear on how he is performing the cancellation. I have added the actual images btw, from the book itself. — Spacey, Aug 28 '12 at 15:05
Use my hint and the very last display equation in the block quote you give (with linearity of expectation as well). Also, two points of clarification: (a) is that sum intended to be finite or infinite and (b) it would help to cite the text that the problem and images come from. — cardinal, Aug 28 '12 at 15:09
@cardinal Ok, I edited to add info on the book, (ICA, edited in beginning of post), and I also edited the summation. (Yes, it is finite, just for as many functions $F$ we chose to have. About the problem, yes, I have explicitly applied log to the standardized gaussian PDF, (and I get $z^2$), but I think my problem is at this point I am stuck. Perhaps I am not clear as to how expectations mesh with the integrals... — Spacey, Aug 28 '12 at 15:24
Honestly, I have no idea how or why this got migrated off the math site, either. At any rate, I'm happy to have it here, where it is equally at home. You've put a good deal of effort into the question. :-) — cardinal, Aug 28 '12 at 16:53
@cardinal It pleases me so much to hear you say that. :-) Yes, hopefully this self-study investment will pay off someday. ;-) — Spacey, Aug 28 '12 at 17:11
It will, @Mohammad, it will! ICA is a very interesting topic also :-). — Néstor, Aug 28 '12 at 17:22

Néstor · Accepted Answer · 2012-08-28T22:20:07.560

9

First, recall that the $c_i$ are constants (they are expectation values, numbers!) so they can be taken outside the integrals (if you can't see it, note that $$c_i=\int p_0(\xi)G^i(\xi)d\xi.$$ If the notation bothers you, just change $\xi$ by $\xi'$ on the $c_i$).

>> To obtain the zero terms:

Recall that $\varphi(\xi)=\exp(-\xi^2/2)/\sqrt{2\pi}$. As suggested by @cardinal, you have to write explicitly $\log\varphi(\xi)$, which is equal to: $$\log\varphi(\xi)=-\xi^2/2-\log\sqrt{2\pi}.$$ With this at hand, you just have to note that: $$c_i\int\varphi(\xi)G^i(\xi)\log \varphi(\xi)=-\frac{1}{2}c_i\int\varphi(\xi)G^i(\xi)\xi^2-\log\sqrt{2\pi}c_i\int\varphi(\xi)G^i(\xi),\ \ \ (1)$$ where I have dropped the constants outside the integrals.

From here, note that in (5.39) it is stated that $\int \varphi(\xi)F^i(\xi)\xi^k$ is $0$ for $k=0,1,2$. The integral on the first term in the right of eq. $(1)$ is of this form (with $k=2$) and the integral in the second term too, (with $k=0$). You just have to exploit this fact on the sums and you are done!

>> To obtain the $\sum c_i^2$ terms:

Note that the integral to be obtained to obtain these terms is: $$\int \varphi(\xi)\left(\sum_{i=1}^{n} c_iG^i(\xi)\right)^2d\xi.$$ We can use the multinomial theorem to expand the squared sum. This gives us: $$\int \varphi(\xi)\sum_{k_1+k_2+...k_n=2} \frac{2!}{k_1! k_2!...k_n!}\prod_{1\leq t \leq n}(c_tG^t(\xi))^{k_t}d\xi.$$ However, from (5.39) again, note that all the terms in this sum which include integrals for the form $$\int \varphi(\xi)G^{i}(\xi)G^{j}(\xi)d\xi$$ are zero for $i\neq j$ and one for $i=j$. This leave us with the result $$\int \varphi(\xi)\left(\sum c_iG^i(\xi)\right)^2d\xi=\sum c_i^2.$$

>> About the $o(\text{whatever})$ notation

I think this is pretty confusing from the authors, but I recall that they use it just to mean that there are terms of order $\text{whatever}$ every time they put $o(\text{whatever})$ (i.e., just like the big-O notation). However, as @Macro commented on this same answer, there is a difference between the big-O notation and the little-O one. Maybe you should check by yourself and see which one suits the problem in this Wikipedia article.

PS: This is a great book by the way. The papers of the authors on the subject are also very good and are a must read if you are trying to understand and implement ICA.

edited Aug 28 '12 at 22:20

answered Aug 28 '12 at 16:18

Néstor

3,717
26
37

1

(+1) Good answer. If the sums are infinite, we have to be more careful about interchanging them with the integral. If they are finite (as the OP suggests, but I did not look at the images closely) then everything is straightforward, as you've shown. :-) – cardinal Aug 28 '12 at 16:31
Ah yes! Thank you Nestor, but what about the last two results, that is, the summation with the $c_i^2$, and summation with the small-o notation part? – Spacey Aug 28 '12 at 16:43
1

@cardinal: Oh yes! They ARE finite (I don't know why I wrote they where infinite...). I changed that on my answer. – Néstor Aug 28 '12 at 16:45
@Mohammad, I'm writing on my answers your other two questions ;-). – Néstor Aug 28 '12 at 17:05
@Néstor Thanks again friend. Ok, I think now then my issue is something silly - so you mean that, let us assume there was just one function, so just $F^1$, so you mean that $\int \phi(\zeta) (\sum c_1 G^1(\zeta))^2 d\zeta = \sum c_i^2 \; \int \phi(\zeta) d\zeta$, and so, of course the latter part is going to one. So we have simply taken that $\sum c_i^2$ outside, yes? – Spacey Aug 28 '12 at 17:37
@Néstor About the small-o, what I dont understand is, the small-o of the summation, is this then just a number in the end? How do i evaluate it here? – Spacey Aug 28 '12 at 17:39
@Mohammad in that case there is no need for the summation :-). So in the case of just one function, as you imply, yes, we take the squared $c_i$ outside the integral (because it is a constant): $$\int \varphi(\xi)(c_1G^1(\xi))^2d\xi=c_1^2\int \varphi(\xi)(G^1(\xi))^2d\xi.$$ About the small-o notation in this particular book (which I interpreted as the same as the big-O notation), you can see more here: http://en.wikipedia.org/wiki/Big_O_notation. – Néstor Aug 28 '12 at 17:45
@Néstor Yes, sorry, nevermind. I got it now. OMG. What a ride. Now I feel silly. Anyway, thank you so much!! :-) – Spacey Aug 28 '12 at 17:51
1

@Néstor, +1 to this answer but re: your last comment, I think there is a distinction between big-O and [little-o](http://en.wikipedia.org/wiki/Big_O_notation#Little-o_notation) notation. – Macro Aug 28 '12 at 17:59
Yes, I read about it too, but I don't think it makes sense in this case...does it? Anyways, I'll leave that on the answer, and let the OP decide (I have the book at home, I'll give a look at it when I get there). – Néstor Aug 28 '12 at 22:16
@Néstor, I haven't thought about this since grad school but I think the difference is that "$f(x)$ is 'little o' of $g(x)$" means that $f(x)$ eventually becomes negligible compared to $g(x)$, i.e. $\lim_{x \rightarrow \infty} f(x)/g(x) = 0$ while "$f(x)$ is 'big-O' of $g(x)$" just means that for a sufficiently large $x$, $f(x)$ will always be less than a fixed constant multiple of $g(x)$, which is a weaker condition. – Macro Aug 28 '12 at 22:30
Hmmm...if that's the case, then little-o notation makes more sense. Maybe you should put that as an answer? I would feel like stealing that one from you ;-). – Néstor Aug 28 '12 at 22:36
@Néstor and Marco, What I would like to understand in the end, is when I come to compute $J(x) = H(x) - H(v)$, what do I do about the o-notation part? Everything else will work out and give eq. 5.42, but that o-notation summation, where does it go? – Spacey Aug 28 '12 at 22:44
@Mohammad, if it's the actual little-o notation, then all the terms with the little-o vanish because all the other terms are more important. – Néstor Aug 28 '12 at 23:50

Deriving Negentropy. Getting stuck

1 Answers1

Linked