Iteratively Reweighted Least Squares - Weights Confusion

Question

In performing Iteratively Reweighted Least Squares (IRLS) to derive $\hat{\beta}$ estimates for logistic regression, all the resource I've read online say to use weights inversely proportional to the variance of each $Y_i$. For example, see step 4, below (taken from here):

However, in R code implementations of IRLS I've found online, the weights are set as the variance, not the inverse of the variance (e.g. here or here).

For example, see the below image. It sets s = mu * (1 - mu), which is the variance given that $Y$ in logistic regression has a Bernoulli distribution. So when the weights matrix, S, is calculated, shouldn't it be S = diag(1/s) instead of S = diag(s)? What am I missing here?

Having this clarified would be very helpful. I know that for weighted least squares, $w_i = \dfrac{1}{\sigma_i^2}$ is standard, so why does the weight matrix in IRLS implementations not invert the variance?

Thomas Lumley · Accepted Answer · 2021-04-23T08:07:32.150

The IWLS algorithm for generalised linear models is different from that for a heteroscedastic linear model because it accounts for two things:

the non-linear link function
the variance-mean relationship

The likelihood score equations look like $$\frac{d\mu}{d\beta}\frac{1}{V(\mu)}(Y-\mu)=0$$ so the variance is in the denominator, as you expect. We can expand ${d\mu}/{d\beta}$: $$\frac{d\eta}{d\beta}\frac{d\mu}{d\eta}\frac{1}{V(\mu)}(Y-\mu)=0$$ and $d\eta/d\beta$ is just $X^T$, so $$X^T\frac{d\mu}{d\eta}\frac{1}{V(\mu)}(Y-\mu)=0$$

We want to define a new response variable $Z$ and weight variable $W$ so that the WLS equations $$X^TW(Z-X\beta)=0$$ match the likelihood equations. This is done with

working response $Z=(Y-X\beta)\frac{d\eta}{d\mu}$, which is a first-order approximation to transforming $Y$ with the link function
working weights $W=(\frac{d\mu}{d\eta})^2\frac{1}{V(\mu)}$

Note that the variance is still in the denominator. However, for the so-called canonical link function for each distribution, it so happens that $d\mu/d\eta=V(\mu)$ and the working weights are equal to $V(\mu)^2V(\mu)^{-1}=V(\mu)$. That is, it looks as though the variance has been put in the numerator instead.

You can see the variance is really going in the denominator by looking at the IWLS algorithm for non-canonical links, such as the identity link for a binomial or Poisson model, where $d\mu/d\eta=1$.

This is a very straightforward and clear answer, thank you! – bob Apr 23 '21 at 18:45 — bob, Apr 23 '21 at 18:45

Iteratively Reweighted Least Squares - Weights Confusion

1 Answers1