12

If $X \sim F$ where the support of $X$ is $\mathbb{R}^p$. So, $X = (X_1, X_2, \dots, X_p)$. Then say I assume $X$ has $k$ finite moments. When $p = 1$, I know that that means $$\int_{\mathbb{R}} x^k\, f(x)\, dx < \infty, $$ where $f(x)$ is the associated density of $F$. What the mathematical equivalent of assuming $X$ has $k$ finite moments when $p > 1$?


In this link, on page 2, the authors define the $k$th moment as $$E\|X\|^k = \int \|X\|^k f(x) \, dx, $$ where $\| \cdot\|$ is the Euclidean norm.

Glen_b's answer here suggests that the $k$th moment would be $$\int x_1^kx_2^k \dots x_p^k \, f(x) dx. $$

Does assuming one to be finite imply the other is finite?

Greenparker
  • 14,131
  • 3
  • 36
  • 80
  • Have you seen this language used for $p>1$ somewhere? Essentially for $p>1$ the moments will be $k_{th}$ order tensors. So for $k=1$ you have a mean vector, for $k=2$ you have a (co-)variance matrix, for $k=3$ you would have a $3_{rd}$ order "skewness" tensor, and so on. (Assuming moments about the mean, for $k>1$.) – GeoMatt22 Sep 08 '16 at 22:11
  • @GeoMatt22 That is correct. Yes, I have seen the language used. For example [here](http://www.sciencedirect.com/science/article/pii/0047259X88900358) they talk about $2 + \delta$ finite moments of a random vector. – Greenparker Sep 08 '16 at 22:17
  • Perhaps the meaning would be that all entries of the moment-tensor are finite? – GeoMatt22 Sep 08 '16 at 22:27
  • @Greenparker could you cite that passage in the text? Can't find it. – ekvall Sep 08 '16 at 22:59
  • @Student001 Oops sorry, wrong link. [Here](https://projecteuclid.org/euclid.aop/1176994565) is the right link. Look at the statement of say Theorem 4, page 6. – Greenparker Sep 09 '16 at 05:38
  • Here is something I have come across: Let $\left\Vert\cdot\right\Vert$ stand for the Euclidean vector norm and the corresponding induced matrix norm, $\left\Vert \cdot\right\Vert _{r}=\sqrt[r]{E\left(\left\Vert\cdot\right\Vert^r\right)}$ is the $L_r$ norm of a random variable or vector. – Christoph Hanck Sep 09 '16 at 07:36
  • Some people apparently use moments in the tensor-sense for Banach space valued r.v.s: http://arxiv.org/pdf/1208.4272v2.pdf. In the paper you cited, I interpreted $k$ finite moments as $\mathbb E \Vert X \Vert^k < \infty$. – ekvall Sep 09 '16 at 11:28
  • @Student001 Thanks for the link. Just to confirm, $\| \cdot\|$ is the Euclidean norm? – Greenparker Sep 09 '16 at 12:53
  • @ChristophHanck Could you complete your thought? I understand what the $L_r$ norm is, could you explain how this relates to the moments? – Greenparker Sep 09 '16 at 12:54
  • If I understand @Student001 correctly, I mean the same thing he proposes. – Christoph Hanck Sep 09 '16 at 13:00
  • I meant it more in general, but if you are in $\mathbb R ^k$ that would be a natural choice. I skimmed parts of the paper, and it seemed to me they were more general, e.g. the r.v.s take values in a Banach space with *some* norm that they denote $\Vert \cdot \Vert$. – ekvall Sep 09 '16 at 15:08
  • @Greenparker Maybe you want a tensor view? – Henry.L Dec 11 '16 at 17:34

2 Answers2

6

The answer is in the negative, but the problem can be fixed up.

To see what goes wrong, let $X$ have a Student t distribution with two degrees of freedom. Its salient properties are that $\mathbb{E}(|X|)$ is finite but $\mathbb{E}(|X|^2)=\infty$. Consider the bivariate distribution of $(X,X)$. Let $f(x,y)dxdy$ be its distribution element (which is singular: it is supported only on the diagonal $x=y$). Along the diagonal, $||(x,y)||=|x|\sqrt{2}$, whence

$$\mathbb{E}\left(||(X,X)||^1\right) = \mathbb{E}\left(\sqrt{2}|X|\right) \lt \infty$$

whereas

$$\iint x^1 y^1 f(x,y) dx dy = \int x^2 f(x,x) dx = \infty.$$

Analogous computations in $p$ dimensions should make it clear that $$\int\cdots\int |x_1|^k|x_2|^k\cdots |x_p|^k f(x_1,\ldots, x_p)dx_1\cdots dx_p$$

really is a moment of order $pk$, not $k$. For more about multivariate moments, please see Let $\mathbf{Y}$ be a random vector. Are $k$th moments of $\mathbf{Y}$ considered?.


To find out what the relationships ought to be between the multivariate moments and the moments of the norm, we will need two inequalities. Let $x=(x_1, \ldots, x_p)$ be any $p$-dimensional vector and let $k_1, k_2, \ldots, k_p$ be positive numbers. Write $k=k_1+k_2+\cdots k_p$ for their sum (implying $k_i/k \le 1$ for all $i$). Let $q \gt 0$ be any positive number (in the application, $q=2$ for the Euclidean norm, but it turns out there's nothing special about the value $2$). As is customary, write

$$||x||_q = \left(\sum_i |x_i|^q\right)^{1/q}.$$

First, let's apply the AM-GM inequality to the non-negative numbers $|x_i|^q$ with weights $k_i$. This asserts that the weighted geometric mean cannot exceed the weighted arithmetic mean:

$$\left(\prod_i (|x_i|^q)^{k_i}\right)^{1/k} \le \frac{1}{k}\sum_i k_i|x_i|^q.$$

Overestimate the right hand side by replacing each $k_i/k$ by $1$ and take the $k/q$ power of both sides:

$$\prod_i |x_i|^{k_i} = \left(\left(\prod_i (|x_i|^q)^{k_i}\right)^{1/k}\right)^{k/q} \le \left(\sum_i |x_i|^q\right)^{k/q} = ||x||_q^k.\tag{1}$$

Now let's overestimate $||x||_q$ by replacing each term $|x_i|^q$ by the largest among them, $\max(|x_i|^q) = \max(|x_i|)^q$:

$$||x||_q \le \left(\sum_i \max(|x_i|^q)\right)^{1/q} = \left(p \max(|x_i|)^q\right)^{1/q} = p^{1/q} \max(|x_i|).$$

Taking $k^\text{th}$ powers yields

$$||x||_q^k \le p^{k/q} \max(|x_i|^k) \le p^{k/q} \sum_i |x_i|^k.\tag{2}$$

As a matter of notation, write

$$\mu(k_1,k_2,\ldots,k_p) = \int\cdots \int |x_1|^{k_1}|x_2|^{k_2}\cdots|x_p|^{k_p} f(x)\,dx.$$

This is the moment of order $(k_1,k_2,\ldots,k_p)$ (and total order $k$). By integrating aginst $f$, inequality $(1)$ establishes

$$\mu(k_1,\ldots,k_p) \le \int\cdots\int ||x||_q^k f(x)\,dx = \mathbb{E}(||X||_q^{k})\tag{3}$$

and inequality $(2)$ gives $$\mathbb{E}(||X||_q^{k})\le p^{k/q}\left(\mu(k,0,\ldots,0) + \mu(0,k,0,\ldots,0) + \cdots + \mu(0,\ldots,0,k)\right).\tag{4}$$

Its right hand side is, up to a constant multiple, the sum of the univariate $k^\text{th}$ moments. Together, $(3)$ and $(4)$ show

  • Finiteness of all univariate $k^\text{th}$ moments implies finiteness of $\mathbb{E}(||X||_q^{k})$.

  • Finiteness of $\mathbb{E}(||X||_q^{k})$ implies finiteness of all $\mu(k_1,\ldots,k_p)$ for which $k_1+\cdots +k_p=k$.

Indeed, these two conclusions combine as a syllogism to show that finiteness of the univariate moments of order $k$ implies finiteness of all multivariate moments of total order $k$.

Thus,

For all $q \gt 0$, the $k^\text{th}$ moment of the $L_q$ norm $\mathbb{E}(||X||_q^{k})$ is finite if and only if all moments of total order $k$ are finite.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
2

@whuber 's answer is correct and well-composed.

I wrote this thread only to elaborate why such a problem can be better addressed in language of tensors. I previously thought that tensor viewpoint is widely accepted in statistics community, now I know this is not the case.

In pp.46-47 of [McCullagh], he stated how we could view moments as tensors. I explained it basically following his words. Let $\boldsymbol{X}=(X_{1},\cdots X_{p})$ be a random vector, and we can discuss its (central) moments $\kappa^{i,j}=E(X_{i}-EX_{i})(X_{j}-EX_{j})$. And if we take affine transformations $Y_{r}=\boldsymbol{A}_{r}\boldsymbol{X}+b_{r}$ (equivalently we can write it in matrix notation $\boldsymbol{Y=AX+b})$ in the probability space, then the resulting (central) moment of $Y_{r},Y_{s}$ is $$\kappa^{r,s}=\frac{\partial Y_{r}}{\partial X_{i}}\frac{\partial Y_{s}}{\partial X_{j}}\kappa^{i,j}$$ by transformation formula. So the moment behaves like a (0,1) contravariant tensor. If we accept such a tensor view, then the $L^{p}$ norm/the moments of a random variable can be treated as a tensor norm. So as a matter of fact, multi-index tensor norm of the highest order does not necessarily bound lower order multi-index tensor norm. Now since the tensor is given by first order differential operators, Sobolev tensor norm comes into play naturally, e.g. in wavelets. And there is a lot of counter examples that the highest order norm does not bound lower order norms in Sobolev-Besov spaces. (MO post)

As for the reason why we should adopt such a view, the story is much longer, but a brief comment is following.

The classic reference in establishing this view is [McCullagh] and later scattered works in "machine learning" literature. But the origin of such a view is actually pursued much earlier in the Bayesian's works [Jeffereys]. Such a view definitely helps visualization and probably motivated some research in statistical shape analysis like those early works by Mardia.

$\blacksquare$ Reference

[McCullagh]http://www.stat.uchicago.edu/~pmcc/tensorbook/ch1.pdf

[Jeffreys]Jeffreys, Harold. Cartesian tensors. Cambridge University Press, 1931.

Henry.L
  • 2,260
  • 1
  • 13
  • 32