What's the point in using identity matrix as weighting matrix in GMM?

Question

What is the point of using the identity matrix as weighting matrix in GMM?

GMM is the minimizer of the distance $g_n(\delta)'\hat{W}g_n({\delta})$, where $g_n = \frac{1}{n}\sum_ix_i\epsilon_i$. If we set $\hat{W}=I$, we would get a distance equal to $g_n(\delta)'g_n({\delta})$, i.e. the sum of squared coordinates of $g_n$.

The result of the minimization is still a GMM estimator but it is clearly not efficient (we should have set $\hat{W}=S^{-1}$, where $S = \frac{1}{n}\sum_ix_i'\epsilon_i'\epsilon_ix_i$).

So why should we proceed in this direction? Is it something common in practice as a first step towards the best GMM or are there other reasons?

score 5 · Accepted Answer · answered Mar 17 '19 at 17:40

5

Yes, getting a first step estimator is the canonical use. Of course, the error terms in $$S = \frac{1}{n}\sum_i\epsilon_i^2x_ix_i'$$ are not observable, so that you need to replace them with something feasible. As the efficient GMM estimator depends on $\hat S$, you first need some feasible preliminary estimator such as the one using $I$ as the weighting matrix.

There may be some further interesting considerations in a multiple equation setup, in which misspecification in one equation can "pollute" the entire system. You can avoid that risk through a less efficient, but more robust block-diagonal weighting matrix, of which $I$ would be an example.

answered Mar 17 '19 at 17:40

Christoph Hanck

25,948
3
57
106

Here, can I ask a question in this question? When We set $W=I$, why GMM is inefficient? How can I prove this? Can you please mention about this proof? Thank you. – 1190 Jun 25 '21 at 14:23
I posted a second answer addressing this point (for any $W$, not just $I$). – Christoph Hanck Jun 25 '21 at 15:00

Christoph Hanck · Answer 2 · 2021-06-25T15:05:18.167

This second answer addresses the question posed in the comment to the first answer as to why the specific choice of $W$ results in an efficient GMM estimator.

The efficient weighting matrix results from the general one by setting $W=S^{-1}$ to get an asymptotic variance \begin{eqnarray} \mathrm{Avar}(\widehat{\delta}(\widehat{S}))&=&(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\Sigma_{xz}'S^{-1}SS^{-1}\Sigma_{xz}(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\notag\\ &=&(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\Sigma_{xz}'S^{-1}\Sigma_{xz}(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\notag\\ &=&(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\label{avareffgmm}%\\[-4ex] \end{eqnarray} We therefore need to show that the difference between the general asymptotic variance and the one with the specific (to be shown) efficient weighting matrix is p.d.: $$ (\Sigma_{xz}'W\Sigma_{xz})^{-1}\Sigma_{xz}'WSW\Sigma_{xz}(\Sigma_{xz}'W\Sigma_{xz})^{-1}-(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\geqslant0$$ Linear algebra (see Thm. 1.24, Magnus/Neudecker 1988, i.e. $A-B\geqslant0\Leftrightarrow B^{-1}-A^{-1}\geqslant0$, much like $3>2$, but $1/2>1/3$) tells us that this condition is equivalent to $$ Q:=\Sigma_{xz}'S^{-1}\Sigma_{xz}-\Sigma_{xz}'W\Sigma_{xz}(\Sigma_{xz}'WSW\Sigma_{xz})^{-1}\Sigma_{xz}'W\Sigma_{xz}\geqslant 0$$ As $S$ is p.d., $S^{-1}$ can be decomposed as $S^{-1}=C'C$. Further define $H=C\Sigma_{xz}$ and $G=C'^{-1}W\Sigma_{xz}$. Then, \begin{eqnarray*} Q&=&\Sigma_{xz}'C'C\Sigma_{xz}-\Sigma_{xz}'W\Sigma_{xz}(\Sigma_{xz}'WC^{-1}C'^{-1}W\Sigma_{xz})^{-1}\Sigma_{xz}'W\Sigma_{xz}\\ &=&H'H-\Sigma_{xz}'W\Sigma_{xz}(G'G)^{-1}\Sigma_{xz}'W\Sigma_{xz}\\ &=&H'H-\Sigma_{xz}'C'C'^{-1}W\Sigma_{xz}(G'G)^{-1}\Sigma_{xz}'WC^{-1}C\Sigma_{xz}\\ &=&H'H-H'G(G'G)^{-1}G'H\\ &=&H'(I-G(G'G)^{-1}G')H\\[-4ex] \end{eqnarray*} The matrix in brackets is, as usual, symmetric and idempotent and therefore p.s.d. Thus, for an arbitrary $a$, \begin{eqnarray*} a'Qa&=&a'H'(I-G(G'G)^{-1}G')Ha\\ &=:&c'(I-G(G'G)^{-1}G')c\geqslant0 \end{eqnarray*}

What's the point in using identity matrix as weighting matrix in GMM?

2 Answers2