Consider two random variables $X,Y$ (possibly multi-dimensional). The mutual information is defined by:
$$ I(X,Y) = \sum_{x,y} P(x,y)\ln\left(\frac{P(x,y)}{P(x)P(y)}\right) $$
where $P(x,y)$ is the joint-density of $X,Y$ and $P(x)$, $P(y)$ the marginals.
It is often said that the mutual information $I(X,Y)$ quantifies "correlations of all orders" between $X$ and $Y$. Is there a way to understand this statement, by seeing an expansion of $I(X,Y)$ in terms of (joint) moments of $X,Y$?