4

The $n^{th}$ moment of a distribution can be estimated from a vector of samples $(x_1,x_2,...x_k)$ by: $$ \sum_{i=1}^{k} x_i^n $$ Now, let's say I've calculated the first $m$ moments for my distribution. How do I then go about doing the normal things I would do with my distribution, like finding the $PDF(x)$ or $CDF(x)$? If $m=2$, this is easy because it's just a Gaussian. But for any other value of $m$ I'm pretty much lost.

Mike Izbicki
  • 363
  • 1
  • 11
  • If you assume your distribution is Gaussian, it doesn't matter how many moments you have calculated: the first two moments provide good estimates of its parameters and they are the most stable of the whole collection of moments. So what specifically do you want to assume about your distribution (and why are you bothering to compute high moments in the first place)? – whuber Nov 30 '12 at 17:30
  • I don't want to assume that it is Gaussian. All I want to assume is that it is accurately described by $m$ moments. – Mike Izbicki Nov 30 '12 at 17:41
  • Do you want to estimate the parameters of a parametric model (PDF, CDF) using the moments or are you looking for a nonparametric density estimator based on moments? What is the motivation of your problem? –  Nov 30 '12 at 17:57
  • @Procrastinator I'm not 100% sure what that means, but I think it's the nonparametric density estimator based on the moments. I basically want to be able to model the input vector as accurately as possible using moments. So if I have 100,000,000 data points, that's definitely enough to get an accurate estimate of more than just the 1st and 2nd moment, so I should be able to more accurately describe the distribution with a 3rd and 4th moment as well. So a pdf generated using this extra information should be more accurate than just the Gaussian pdf. Right? – Mike Izbicki Nov 30 '12 at 18:04
  • 1
    Your approach sounds interesting. However, with this huge sample a [kernel density estimator](http://en.wikipedia.org/wiki/Kernel_density_estimation) is likely to perform well. It is implemented in R in the command `density()`. Another kind of nonparametric estimator is implemented in the package [logcondens](http://cran.r-project.org/web/packages/logcondens/logcondens.pdf). –  Nov 30 '12 at 18:07
  • 2
    You might be interested in [this](http://www.personal.psu.edu/users/f/k/fkv/2000-06-moment-as.pdf). – tchakravarty Nov 30 '12 at 18:48
  • @Procrastinator I'm not so much interested in a specific solution to that problem, as generally understanding the importance of estimated moments on a distribution. – Mike Izbicki Nov 30 '12 at 19:50
  • There are number of moment based families of distributions, including the Pearson family, the Johnson family, Gram-Charlier methods etc. See, for instance: https://stats.stackexchange.com/questions/175323/how-to-fit-an-approximate-pdf-i-e-density-estimation-using-the-first-k-empi and https://stats.stackexchange.com/questions/189941/skewness-kurtosis-plot-for-different-distribution/ – wolfies Feb 17 '18 at 14:05

1 Answers1

1

The problem you are dealing with is known as the Hamburger moment problem, which seeks to characterise and recover a distribution from a known series of moments.

It is only possible to characterise the distribution if the entire series of moments is specified, not just the first $m$ moments. In view of this, perhaps you could enhance your problem by setting a rule that assumes what the remaining moments in the series are taken to be (e.g., you could assume that all central moments after the $m$th moment match the standard normal distribution). Also, just because you only estimate the first two moments does not mean you are dealing with a normal distribution - you might assume that it is a normal distribution, but that would be an assumption, not a logical implication of your estimation problem.

In any case, once you have specified the entire series of moments (e.g., by estimating the first $m$ and assuming the later ones) you can form the moment generating function:

$$m(t) = \sum_{n=0}^\infty \frac{t^n}{n!} m_n.$$

If this function exists (i.e., if the sum converges) then you have a well-defined moment generating function, and the corresponding distribution can be recovered by inverse-Laplace transformation. In most cases there will not be a closed form solution to this problem, and so the distribution will have to be approximated numerically.

Ben
  • 91,027
  • 3
  • 150
  • 376