23

Why in "Method of Moments", we equate sample moments to population moments for finding point estimator?

Where is the logic behind this?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user 31466
  • 1,197
  • 13
  • 31
  • 2
    It'd be nice if we had a physicist in our community to tackle this one. – mugen Dec 15 '14 at 15:10
  • 5
    @mugen, I see no relation to physics whatsoever. – Aksakal Dec 15 '14 at 15:52
  • 2
    @Aksakal they use moments of functions in physics too, and it's always nice when somebody makes a parallel for better interpretation. – mugen Dec 15 '14 at 16:30
  • 2
    As mentioned in [this answer](http://stats.stackexchange.com/questions/122430/whats-the-difference-between-estimating-equations-and-method-of-moments-estimat/122440#122440), the [law of large numbers](http://en.wikipedia.org/wiki/Law_of_large_numbers) provides a justification (albeit asymptotic) for estimating a population moment by a sample moment, resulting in (often) simple, [consistent estimators](http://en.wikipedia.org/wiki/Method_of_moments_%28statistics%29#Advantages_and_disadvantages_of_this_method) – Glen_b Dec 16 '14 at 00:31
  • 1
    Ain't the whole idea is to represent the parameters using moments? Like if you try to estimate the parameter of Poisson distribution, by finding the mean (first moment) you can use it as an estimator for your parameter lambda. – denis631 Aug 09 '17 at 16:08
  • The comment by @denis gets close to the heart of the matter: MoM can be viewed as closely related to the "plug-in principle." Expanding the concept to the [generalized MoM](https://en.wikipedia.org/wiki/Generalized_method_of_moments) is particularly revealing. – whuber Mar 03 '22 at 17:14

3 Answers3

16

A sample consisting of $n$ realizations from identically and independently distributed random variables is ergodic. In a such a case, "sample moments" are consistent estimators of theoretical moments of the common distribution, if the theoretical moments exist and are finite.

This means that

$$\hat \mu_k(n) = \mu_k(\theta) + e_k(n), \;\;\; e_k(n) \xrightarrow{p} 0 \tag{1}$$

So by equating the theoretical moment with the corresponding sample moment we have

$$\hat \mu_k(n) = \mu_k(\theta) \Rightarrow \hat \theta(n) = \mu_k^{-1}(\hat \mu_k(n)) = \mu_k^{-1}[\mu_k(\theta) + e_k(n)]$$

So ($\mu_k$ does not depend on $n$)

$$\text{plim} \hat \theta(n) = \text{plim}\big[\mu_k^{-1}(\mu_k(\theta) + e_k)\big] = \mu_k^{-1}\big(\mu_k(\theta) + \text{plim}e_k(n)\big)$$

$$=\mu_k^{-1}\big(\mu_k(\theta) + 0\big) = \mu_k^{-1}\mu_k(\theta) = \theta$$

So we do that because we obtain consistent estimators for the unknown parameters.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • what does "plim" mean ? I am not familiar with "p" in $e_k(n) \xrightarrow{p} 0$ – user 31466 Dec 16 '14 at 08:59
  • @leaf probability limit – Alecos Papadopoulos Dec 16 '14 at 11:00
  • What would be happen if it was regular limit instead of probability limit? – user 31466 Dec 16 '14 at 12:05
  • 1
    It would tell us that the estimator _becomes_ a constant, not that it tends probabilistically to one. Perhaps you should look up modes of convergence of random variables, wikipedia has a decent introduction, http://en.wikipedia.org/wiki/Convergence_of_random_variables – Alecos Papadopoulos Dec 16 '14 at 13:28
  • This of course assumes additional conditions on $\mu_k$. I'm wondering if it's appropriate to mention this, or if it only obscures the point made. – Jerome Baum Dec 19 '14 at 19:14
  • @JeromeBaum The only conditions are that the moment in question exists and is finite. I added it to the answer for completeness. – Alecos Papadopoulos Dec 19 '14 at 19:45
  • @AlecosPapadopoulos I was referring to details such as identifiability, and continuity of $\mu_k^{-1}$ (can't tell off the top of my head whether a weaker condition is sufficient?). But that's what I meant about clouding things up rather than helping -- these are technical details. – Jerome Baum Dec 19 '14 at 20:20
  • @JeromeBaum Sorry, you're right, there are more of these. Well, I call "technical" only these conditions that are almost always satisfied (or assumed to be satisfied) in practice and/or in theory. Otherwise they are important. – Alecos Papadopoulos Dec 19 '14 at 20:23
  • 1
    @AlecosPapadopoulos Agreed. I'm wondering then whether it makes sense to put something simple like "... and under certain conditions on $\mu_k$"? – Jerome Baum Dec 20 '14 at 04:26
13

Econometricians call this "the analogy principle". You compute the population mean as the expected value with respect to the population distribution; you compute the estimator as the expected value with respect to the sample distribution, and it turns out to be the sample mean. You have a unified expression $$ T(F) = \int t(x) \, {\rm d}F(x) $$ into which you plug either the population $F(x)$, say $F(x) = \int_{\infty}^x \frac1{\sqrt{2\pi\sigma^2}} \exp\bigl[ - \frac{(u-\mu)^2}{2\sigma^2} \bigr] \, {\rm d}u $ or the sample $F_n(x) = \frac 1n \sum_{i=1}^n 1\{ x_i \le x \}$, so that ${\rm d}F_n(x)$ is a bunch of delta-functions, and the (Lebesgue) integral with respect to ${\rm d}F_n(x)$ is the sample sum $\frac1n \sum_{i=1}^n t(x_i)$. If your functional $T(\cdot)$ is (weakly) differentiable, and $F_n(x)$ converges in the appropriate sense to $F(x)$, then it is easy to establish that the estimate is consistent, although of course more hoopla is needed to obtain say asymptotic normality.

StasK
  • 29,235
  • 2
  • 80
  • 165
  • 1
    I haven't heard this called "analogy principle", but it is an often used econometric analysis pattern indeed: plug the sample estimator whenever the population parameter is needed but unknown. – Aksakal Dec 15 '14 at 15:56
  • @Aksakal:"plug the sample estimator whenever the population parameter is needed but unknown." isn't this approach simply called statistics? – user603 Dec 16 '14 at 00:36
  • @user603: No, not. There are other alternative approaches, and plug-in estimators can be bad. – kjetil b halvorsen Dec 25 '16 at 17:55
0

I might be wrong but the way that I think about it is as follows:

Let's say you have samples $X_1, X_2, \dotsc, X_n$. Then, the method of the moments suggest that we should compare $m-th$ moment of sample with $m$th moment of population

$$(X_1 + X_2 + \dotsm + X_n) / n = μ$$

here, we are averaging all the samples out which seems like a good estimate of population mean

$$(X_1^2 + X_2^2 + \dotsm + X_n^2) / n = σ^2$$

here, we are averaging all the sample variances out which also seems like a good estimate of population variance

And so on,

That is what I think is a good explanation of how method of moment works!

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
kanjurer
  • 1
  • 1
  • In the first equation you compare *first* moments, not $m^\text{th}$ moments. In the last equation you aren't averaging variances and they are not sample moments, either. – whuber Mar 03 '22 at 17:10