Why using canonical link function leads to the sum of residual equal zero in GLM?

Question

According to @Momo in this thread What is the difference between a "link function" and a "canonical link function" for GLM when we use the canonical link function in a GLM model, the sum of residual is zero.

As the post is no longer active, could you please tell me which is the definition of "residual" that @Momo refered to and why the sum of residual is zero ?

Thank you very much for your help!

AdamO · Accepted Answer · 2021-10-11T21:59:49.430

3

Any GLM (whether canonical or not) has an estimating function of the form:

$$ U(\beta; X,y) = D^{T} V^{-1} \left( y - g^{-1}(X^T\beta)\right)$$

Where $X$ is the design matrix of covariates or predictors (possibly including an intercept), $y$ the vector of responses, $V$ is the variance-covariance matrix of the conditional response, $g$ is the link function, and the $D$ matrix is the $n \times n$ Jacobian of the mean vector wrt $\beta$.

Since in general, a GLM is not necessarily a maximum likelihood estimator, the GLM solution is a method of moments estimator solving $\sum_{i=1}^n U_{i} (\beta; X_i, y_i) = 0$. ($X_i$ is the $i$-th row of the design matrix).

However, when you use a canonical link, the GLM is a maximum likelihood procedure. The estimating function $U$ is actually a score function $S$ (derivative of the log-likelihood). And the score has the special property that $D = V$. In that case, the $D^{T}$ and $V^{-1}$ in display one cancel out, and the canonical GLM has the score function of the form:

$$ S(\beta; X,y) = y - g^{-1}(X^T\beta).$$

As a result, any solution to a canonical GLM will necessarily have a 0 sum of the residuals.

For example, consider logistic regression where $$g(\eta) = \log(\eta)/(1-\log(\eta)$$. Then $g^{-1}(\nu) = \exp(\nu)/(1+\exp(\nu))$ and

$$\begin{eqnarray} \frac{\partial} {\partial \nu} g^{-1}(\nu) &=& -\exp(\nu)/(1+\exp(\nu))^2 \\ &=& g^{-1}(\nu) ( 1- g^{-1}(\nu)) \\ &=& \mu(1-\mu). \end{eqnarray}$$

Which is readily recognizable as the mean-variance relationship of a Bernoulli random variable, and the structure of $V$.

Reference:

McCullogh, Nelder "Generalized Linear Models" (http://www.utstat.toronto.edu/~brunner/oldclass/2201s11/readings/glmbook.pdf)
Wakefield "Bayesian and Frequentist Regression Models"
Stefanski, Boos "Essential Statistical Inference"

edited Oct 11 '21 at 21:59

answered Oct 11 '21 at 15:32

AdamO

52,330
5
104
209

Hi, thank you very much for your answer! Do you mean that, if we estimate the GLM coefficients by method of moments, then with the canonical link function, we will have the sum of residual equals to 0 ? does the result hold when we use the MLE ? thank you for your help!! – InTheSearchForKnowledge Oct 11 '21 at 15:50
Hi, thank you very much for your help! However, i still blocked because i think that, in general, the estimation of GLM is done by MLE (in fact, i have never heard of the estimation by method of moment applied for GLM). Could you please explain more about that ? And moreover, i don't really understand the relationship of your example and my question, could you please add something to make it more understandable ? I have struggled quite a lot with GLM so your help is precious for me. Thank you for your kindness – InTheSearchForKnowledge Oct 11 '21 at 16:09
Moreover, could you please tell me where can i read to understand your first 3 lines in your answer ? The concept of the "estimating function" is very strange to me. Normally, as i understand, the process of estimating a GLM is as follow: we assume $Y$ follows some distribution of exponential family, then we fit the regression coefficients by MLE. I have never heard of "estimating function" and really don't know why any GLM should have such a form. Thank you so much for your help! – InTheSearchForKnowledge Oct 11 '21 at 16:20
Hi, thank you very much for your references. However, could you please briefly explain the concept of "estimating function" and why in general the estimation of GLM is done by method of moments ? (in every book i have read about GLM, i have never heard about the estimation by method of moments, all the estimations of parameters are done by MLE). Thank you very much !! – InTheSearchForKnowledge Oct 11 '21 at 20:35
1

@InTheSearchForKnowledge We're getting away from the question; it's a devoted chapter in the Wakefield text. Or I'd suggest you search this site, or post another question. – AdamO Oct 11 '21 at 21:55

Why using canonical link function leads to the sum of residual equal zero in GLM?

1 Answers1