0

Suppose I have a blackbox generating values $x_n$. If I take mean($x_n^2$) it flies all over the place as new values are added -- a supermassive 48-digit value could suddenly ramp up a previously stable-looking mean by a factor of 100. However mean(log($x_n$)) converges to L. how can I estimate mean($x_n^2$) from observing log($x_n$)?

I imagine I need to extract more properties of the distribution of log($x_n$), maybe the variance?

A simple analogy might be squares of numbers: if I take the numbers [1,3,4,2,2,12,5,1,8,2], the mean is 4. but if I now square each number: [1,9,16,4,4,144,25,1,64,4], the mean is now 209. So what transformation on 4 was required to achieve 209?

Can anyone recommend a direction of research?

PS The blackbox generator is as follows: take a random permutation of 1000 elements & return the order (i.e. number of cycles required before the original order is restored).

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
P i
  • 121
  • 6
  • It sounds like you are limited by the number of values of $x_n$ you can observe and that your distribution has a very fat tail. However, no amount of knowledge about $\log(x_n)$ will beat the mean estimate of $\frac{1}{n}\sum_{i=1}^nx_i^2$. Your only option is to increase $n$. – Alex R. Aug 10 '17 at 17:26
  • If one indeed needs to operate on so widely dispersed numbers, log-transform might actually be the best starting step, purely because of computational reasons - i.e. precision loss when adding floating-point numbers. Maybe it could be avoided by assuming some distribution for $x$, i.e. lognormal or exponential? – juod Aug 10 '17 at 17:44
  • 5
    What else do you know about the distribution? Consider the following simple example- Suppose $Z\sim N(0,1)$ is a standard normal so $X := e^{\sigma Z}$ is log normal. Then $\mathbb{E}[\ln(X)] = 0$ but $\mathbb{E}[X^2] = \mathbb{E}[e^{2\sigma Z}] = e^{2\sigma^2}$ can be any positive number you like by varying $\sigma$. This shows there is no function such that $\mathbb{E}[X^2] = f(\mathbb{E}[\ln(X)])$ in general. So you need some extra information to have any hope .. – P.Windridge Aug 10 '17 at 17:56
  • Thanks for the feedback. I've added the generating function at the bottom of the post. – P i Aug 10 '17 at 18:06
  • You need more information to pin down the precise mean. But if it helps, Jensen's Inequality tells us that $e^{2L}$ is a lower bound for $E(x^2)$. – knrumsey Aug 10 '17 at 18:07
  • This would be the same as the number of swaps in an insertion sort? – jbowman Aug 10 '17 at 19:27

1 Answers1

1

Just an idea. There is no simple transformation from $\DeclareMathOperator{\E}{\mathbb{E}}\E \log X$ to the expectation or variance of $X$. There cannot be, because this depends heavily on the distribution long out into the tails. If Your distribution is lognormal, however, there is a relation, you can find it at Wikipedia.

But you can try to use empirical transforms, see for instance How does saddlepoint approximation work? for examples.

The moment generating function (mgf) of $X$ is $M(t)=\E e^{t X}$. The Mellin transform of (almost surely positive) $X$ is $m(s)=\E X^s$. That is equal to the mgf of $\log X$, see The distribution of the product of a multivariate normal and a lognormal distribution. So you can estimate the mgf of $\log X$, and observe that $$ \mathbb{V}\text{ar} X=m(2) - m(1)^2 \quad \text{and} \\ \E X = m(1) $$ and you have your estimates.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467