The label 2SLS provides the most intuitive explanation I am aware of: two stage least squares. The name is due to the following possibility to calculate $\widehat{\delta}_{\text{2SLS}}$:
- Regress every regressor $z_i$ on all the instruments $x_i$ and compute the fitted values $\widehat{Z}=P_{X}Z$.
- Regress $y_{i}$ on $\widehat{z}_{i}$. The resulting estimator is $\widehat{\delta}_{\text{2SLS}}$.
Step 1 can be interpreted as extracting the part of the variation in the regressors that is uncorrelated with the errors of regression model, $\epsilon_i$, because, by assumption, the instruments are uncorrelated with the error, while the regressors themselves may not be due to endogeneity. In step 2, we then use that exogenous part of the variation in the regressors to estimate our parameter of interest $\delta$.
This procedure indeed yields 2SLS because the fitted values from step 1 are $\widehat{Z}=P_{X}Z$ and hence
\begin{eqnarray*}
(\widehat{Z}'\widehat{Z})^{-1}\widehat{Z}'y&=&(Z'P_{X}'P_{X}Z)^{-1}Z'P_{X}'y\\
&=&(Z'P_{X}Z)^{-1}Z'P_{X}y\notag\\
&=&\widehat{\delta}_{\text{2SLS}}
\end{eqnarray*}