1

I am attempting to perform an IRLS algorithm to estimate regression parameters for a logistic regression model.
This is the algorithm that I am following

  1. Select initial values for the regression parameters $\boldsymbol{\beta}^{\text {old }}$
  2. Calculate the $p\left(\boldsymbol{x}_{i}, \boldsymbol{\beta}^{\text {old }}\right)=\frac{1}{1+e^{-x_{i}^{T} \beta^{\text {ord }}}}, i=1, \ldots, n$
  3. Calculate the diagonal weight matrix $W$ with elements $p\left(x_{i}, \beta^{o l d}\right)\left(1-p\left(x_{i}, \beta^{\text {old }}\right)\right)$.
  4. Calculate the Gradient vector and Hessian matrix (a) $\frac{\partial l(\boldsymbol{\beta})}{\partial \beta}=\boldsymbol{X}^{T}(\boldsymbol{y}-\boldsymbol{p})$ (b) $\frac{\partial t(\beta)}{\partial \beta \theta \beta^{1}}=-\mathbf{X}^{T} \mathbf{W} \mathbf{X}$
  5. Calculate $\boldsymbol{\beta}^{\text {new }}=\boldsymbol{\beta}^{\text {old }}+\left(\mathbf{X}^{T} \mathbf{W} \mathbf{X}\right)^{-1} \boldsymbol{X}^{T}(\boldsymbol{y}-\boldsymbol{p})$ or $\boldsymbol{\beta}^{\text {new }}=\left(X^{T} W X\right)^{-1} X^{T} W z$, with adjusted response $\mathbf{z}=\left(\mathbf{X} \boldsymbol{\beta}^{\text {old }}+\mathbf{W}^{-1}(\mathbf{y}-\mathbf{p})\right)$
  6. Set $\beta^{\text {old }}=\beta^{\text {new }}$
  7. Repeat steps (2) to (6) until convergence.

The catch is I have to do a sort of fake distributed computation scenario. So my main data set $X$ has 300 observations which I split up into 3 smaller data sets consisting of 100 observations each. $$\mathbf{X}=\left[\begin{array}{c}X_{1} \\ X_{2}\\ X_{3}\end{array}\right]$$

Now from what I understand the idea behind secure summation is as follow $$\left(\mathbf{X}^{T} \mathbf{W} \mathbf{X}\right)^{-1} = \sum_{i=1}^3 \left(\mathbf{X_i}^{T} \mathbf{W_i} \mathbf{X_i}\right)^{-1}$$ and $$\boldsymbol{X}^{T}(\boldsymbol{y}-\boldsymbol{p}) = \sum_{i=1}^3 \boldsymbol{X_i}^{T}(\boldsymbol{y_i}-\boldsymbol{p_i}) $$

So what I am doing looks a bit like this

  1. Select initial values for the regression parameters $\boldsymbol{\beta}^{\text {old }}$
  2. Calculate the $p=\frac{1}{1+e^{-X_{i} \beta^{\text {ord }}}}, i=1, \ldots, 3$
  3. Calculate the diagonal weight matrix $W_i$ for i=1 $\ldots$ 3, with elements $p\left(x_{i}, \beta^{o l d}\right)\left(1-p\left(x_{i}, \beta^{\text {old }}\right)\right)$.
  4. Calculate the Gradient vector and Hessian matrix for i=1 $\ldots$ 3 (a) $\boldsymbol{X_i}^{T}(\boldsymbol{y_i}-\boldsymbol{p_i})$ (b) $\left(\mathbf{X_i}^{T} \mathbf{W_i} \mathbf{X_i}\right)^{-1}$
  5. $$\left(\mathbf{X}^{T} \mathbf{W} \mathbf{X}\right)^{-1} = \sum_{i=1}^3 \left(\mathbf{X_i}^{T} \mathbf{W_i} \mathbf{X_i}\right)^{-1}$$ and $$\boldsymbol{X}^{T}(\boldsymbol{y}-\boldsymbol{p}) = \sum_{i=1}^3 \boldsymbol{X_i}^{T}(\boldsymbol{y_i}-\boldsymbol{p_i}) $$
  6. Calculate $\boldsymbol{\beta}^{\text {new }}=\boldsymbol{\beta}^{\text {old }}+\left(\mathbf{X}^{T} \mathbf{W} \mathbf{X}\right)^{-1} \boldsymbol{X}^{T}(\boldsymbol{y}-\boldsymbol{p})$ or $\boldsymbol{\beta}^{\text {new }}=\left(X^{T} W X\right)^{-1} X^{T} W z$, with adjusted response $\mathbf{z}=\left(\mathbf{X} \boldsymbol{\beta}^{\text {old }}+\mathbf{W}^{-1}(\mathbf{y}-\mathbf{p})\right)$
  7. Set $\beta^{\text {old }}=\beta^{\text {new }}$
  8. Repeat steps (2) to (6) until convergence.

Here is my R code along with some generated data

age <- round(runif(10, 18, 80))
xb <- -9 + 0.2*age
p <- 1/(1 + exp(-xb))
y <- rbinom(n = 10, size = 1, prob = p)

lst = split(df, (0:nrow(df) %/% 100)) 
df_1 <- lst$`0`
df_2 <- lst$`1`
df_3 <- lst$`2`

#########IRLS##########
bho <- matrix(0,2,1)
#DF 1
x_1 <- cbind(matrix(1,100,1),df_1$age)

#DF 2
x_2 <- cbind(matrix(1,100,1),df_2$age)

#DF 3
x_3 <- cbind(matrix(1,100,1),df_3$age)

for (i in 1:30) {
  
  #DF 1
  p_fit_1 <- 1/(1 + exp(-(x_1%*%bho)))
  wp_1 <- as.vector(p_fit_1*(1-p_fit_1))
  w_1 <- diag(wp_1)
  
  hes_1 <- inv(t(x_1)%*%w_1%*%x_1)
  grad_1 <- (t(x_1)%*%(df_1$y-p_fit_1))
  
  #DF 2
  p_fit_2 <- 1/(1 + exp(-(x_2%*%bho)))
  wp_2 <- as.vector(p_fit_2*(1-p_fit_2))
  w_2 <- diag(wp_2)

  hes_2 <- inv(t(x_2)%*%w_2%*%x_2)
  grad_2 <- (t(x_2)%*%(df_2$y-p_fit_2))

  #DF 3
  p_fit_3 <- 1/(1 + exp(-(x_3%*%bho)))
  wp_3 <- as.vector(p_fit_3*(1-p_fit_3))
  w_3 <- diag(wp_3)

  hes_3 <- inv(t(x_3)%*%w_3%*%x_3)
  grad_3 <- (t(x_3)%*%(df_3$y-p_fit_3))

  #Put it together now
  hes <- hes_1+hes_2+hes_3
  grad <- grad_1+grad_2+grad_3
  
  bhn <- bho + hes%*%grad
  
  chnge = max(abs(bhn-bho))
  if (chnge<1E-3) break
  bho <- bhn
  
}

xbn <- x%*%bhn
pn <- 1/(1 + exp(-xbn))
######################

But my values simply do not converge and I sometimes get an error saying Error in Inverse(X, tol = sqrt(.Machine$double.eps), ...) : X is numerically singular. Am I implementing the algorithm wrong? Would anyone be able to assist, please?

Susan-l3p
  • 83
  • 5

0 Answers0