Any GLM (whether canonical or not) has an estimating function of the form:
$$ U(\beta; X,y) = D^{T} V^{-1} \left( y - g^{-1}(X^T\beta)\right)$$
Where $X$ is the design matrix of covariates or predictors (possibly including an intercept), $y$ the vector of responses, $V$ is the variance-covariance matrix of the conditional response, $g$ is the link function, and the $D$ matrix is the $n \times n$ Jacobian of the mean vector wrt $\beta$.
Since in general, a GLM is not necessarily a maximum likelihood estimator, the GLM solution is a method of moments estimator solving $\sum_{i=1}^n U_{i} (\beta; X_i, y_i) = 0$. ($X_i$ is the $i$-th row of the design matrix).
However, when you use a canonical link, the GLM is a maximum likelihood procedure. The estimating function $U$ is actually a score function $S$ (derivative of the log-likelihood). And the score has the special property that $D = V$. In that case, the $D^{T}$ and $V^{-1}$ in display one cancel out, and the canonical GLM has the score function of the form:
$$ S(\beta; X,y) = y - g^{-1}(X^T\beta).$$
As a result, any solution to a canonical GLM will necessarily have a 0 sum of the residuals.
For example, consider logistic regression where $$g(\eta) = \log(\eta)/(1-\log(\eta)$$. Then $g^{-1}(\nu) = \exp(\nu)/(1+\exp(\nu))$ and
$$\begin{eqnarray}
\frac{\partial} {\partial \nu} g^{-1}(\nu) &=& -\exp(\nu)/(1+\exp(\nu))^2 \\
&=& g^{-1}(\nu) ( 1- g^{-1}(\nu)) \\
&=& \mu(1-\mu).
\end{eqnarray}$$
Which is readily recognizable as the mean-variance relationship of a Bernoulli random variable, and the structure of $V$.
Reference: