This second answer addresses the question posed in the comment to the first answer as to why the specific choice of $W$ results in an efficient GMM estimator.
The efficient weighting matrix results from the general one by setting $W=S^{-1}$ to get an asymptotic variance
\begin{eqnarray}
\mathrm{Avar}(\widehat{\delta}(\widehat{S}))&=&(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\Sigma_{xz}'S^{-1}SS^{-1}\Sigma_{xz}(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\notag\\
&=&(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\Sigma_{xz}'S^{-1}\Sigma_{xz}(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\notag\\
&=&(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\label{avareffgmm}%\\[-4ex]
\end{eqnarray}
We therefore need to show that the difference between the general asymptotic variance and the one with the specific (to be shown) efficient weighting matrix is p.d.:
$$
(\Sigma_{xz}'W\Sigma_{xz})^{-1}\Sigma_{xz}'WSW\Sigma_{xz}(\Sigma_{xz}'W\Sigma_{xz})^{-1}-(\Sigma_{xz}'S^{-1}\Sigma_{xz})^{-1}\geqslant0$$ Linear algebra (see Thm. 1.24, Magnus/Neudecker 1988, i.e. $A-B\geqslant0\Leftrightarrow B^{-1}-A^{-1}\geqslant0$, much like $3>2$, but $1/2>1/3$) tells us that this condition is equivalent to
$$
Q:=\Sigma_{xz}'S^{-1}\Sigma_{xz}-\Sigma_{xz}'W\Sigma_{xz}(\Sigma_{xz}'WSW\Sigma_{xz})^{-1}\Sigma_{xz}'W\Sigma_{xz}\geqslant
0$$
As $S$ is p.d., $S^{-1}$ can be decomposed as $S^{-1}=C'C$. Further define $H=C\Sigma_{xz}$ and $G=C'^{-1}W\Sigma_{xz}$.
Then,
\begin{eqnarray*}
Q&=&\Sigma_{xz}'C'C\Sigma_{xz}-\Sigma_{xz}'W\Sigma_{xz}(\Sigma_{xz}'WC^{-1}C'^{-1}W\Sigma_{xz})^{-1}\Sigma_{xz}'W\Sigma_{xz}\\
&=&H'H-\Sigma_{xz}'W\Sigma_{xz}(G'G)^{-1}\Sigma_{xz}'W\Sigma_{xz}\\
&=&H'H-\Sigma_{xz}'C'C'^{-1}W\Sigma_{xz}(G'G)^{-1}\Sigma_{xz}'WC^{-1}C\Sigma_{xz}\\
&=&H'H-H'G(G'G)^{-1}G'H\\
&=&H'(I-G(G'G)^{-1}G')H\\[-4ex]
\end{eqnarray*}
The matrix in brackets is, as usual, symmetric and idempotent and therefore p.s.d. Thus, for an arbitrary $a$,
\begin{eqnarray*}
a'Qa&=&a'H'(I-G(G'G)^{-1}G')Ha\\
&=:&c'(I-G(G'G)^{-1}G')c\geqslant0
\end{eqnarray*}