Using generalized method of moments (GMM) to calculate logistic regression parameter

Question

I want to calculate coefficients to a regression that is very similar to logistic regression (Actually logistic regression with another coefficient: $$ \frac{A}{1 + e^{- (b_0 + b_1 x_1 + b_2 x_2 + \ldots)}},$$ when $A$ could be given). I thought of using GMM to calculate the coefficients, but I'm not sure what are the moment conditions I should use.

Can anyone help me with that?

Thanks!

When you say "$A$ could be given", do you mean it is specified by the user or it is estimated by the model? — Macro, Jun 14 '12 at 11:18
either way. I can put it as an input (e.g.A = 0.25) or be one of the coefficients to be found — user5497, Jun 14 '12 at 11:22
Does it vary from subject to subject (i.e. is it data) or is it a fixed constant across all observations? — Macro, Jun 14 '12 at 12:10
In the specific case where A=1 we get regular logistic regression — user5497, Jun 14 '12 at 12:14
Are you sure your model is correct? Your probabilities are not scaled from $0$ to $1$ if $A \neq 1$. — Dmitry Laptev, Jun 14 '12 at 13:56
We also trying to use Maximum likelihood, but would like to compare the results with GMM — user5497, Jun 14 '12 at 14:56
Yes, we're looking for something that has probability between 0-0.25 and not between 0-1. — user5497, Jun 14 '12 at 14:57
@user5497, the MLE is a more efficient estimator than the GMM estimator. It seems you'd only consider using the GMM estimator if calculating the MLE was intractable for some reason (which it isn't in this case). — Macro, Jun 15 '12 at 17:11
@Macro, maximum likelihood requires assuming a distribution while GMM does not. ML is more efficient if the distributional assumption happens to be correct but not otherwise. — Richard Hardy, Jul 23 '16 at 15:10

josliber · Answer 1 · 2016-07-22T04:17:26.690

Assuming $A\leq 1$, this model has Bernoulli response variable $Y_i$ with

$$ Pr(Y_i = 1) = \frac{A}{1+e^{-X_i'b}}, $$

where $b$ (and possibly $A$, depending on whether it is treated as a constant or a parameter) are the fitted coefficients and $X_i$ is the data for observation $i$. I assume the intercept term is handled by adding a variable with constant value 1 to the data matrix.

The moment conditions are:

\begin{align*} \mathbb{E}\bigg[\bigg(Y_i-\frac{A}{1+e^{-X_i'b}}\bigg)X_i\bigg] &= 0. \end{align*}

We replace this with the sample counterpart of the condition, assuming $N$ observations:

$$ m = \frac{1}{N}\sum_{i=1}^N \bigg[\bigg(Y_i-\frac{A}{1+e^{-X_i'b}}\bigg)X_i\bigg] = 0 $$

This is practically solved by minimizing $m'm$ across all possible coefficient values $b$ (below we will use the Nelder-Mead simplex to perform this optimization).

Borrowing from an excellent R-bloggers tutorial on the topic, it is pretty straightforward to implement this in R with the gmm package. As an example, let's work with the iris dataset, predicting if an iris is versicolor based on its sepal length and width and petal length and width. I'll assume $A$ is constant and equal to 1 in this case:

dat <- as.matrix(cbind(data.frame(IsVersicolor = as.numeric(iris$Species == "versicolor"), Intercept=1), iris[,1:4]))
head(dat)
#      IsVersicolor Intercept Sepal.Length Sepal.Width Petal.Length Petal.Width
# [1,]            0         1          5.1         3.5          1.4         0.2
# [2,]            0         1          4.9         3.0          1.4         0.2
# [3,]            0         1          4.7         3.2          1.3         0.2
# [4,]            0         1          4.6         3.1          1.5         0.2
# [5,]            0         1          5.0         3.6          1.4         0.2
# [6,]            0         1          5.4         3.9          1.7         0.4

Here are the coefficients fitted using logistic regression:

summary(glm(IsVersicolor~., data=as.data.frame(dat[,-2]), family="binomial"))
# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)    
# (Intercept)    7.3785     2.4993   2.952 0.003155 ** 
# Sepal.Length  -0.2454     0.6496  -0.378 0.705634    
# Sepal.Width   -2.7966     0.7835  -3.569 0.000358 ***
# Petal.Length   1.3136     0.6838   1.921 0.054713 .  
# Petal.Width   -2.7783     1.1731  -2.368 0.017868 *

The main piece we need to use gmm is a function that returns the moment conditions, namely rows $(Y_i-\frac{A}{1+e^{-X_i'b}})X_i$ for each observation $i$:

moments <- function(b, X) {
  A <- 1
  as.vector(X[,1] - A / (1 + exp(-(X[,-1] %*% cbind(b))))) * X[,-1]
}

We can now numerically fit coefficients $b$, using the linear regression coefficients as a convenient initial point (as suggested in the tutorial linked above):

init.coef <- lm(IsVersicolor~., data=as.data.frame(dat[,-2]))$coefficients
library(gmm)
fitted <- gmm(moments, x = dat, t0 = init.coef, type = "iterative", crit = 1e-19,
              wmatrix = "optimal", method = "Nelder-Mead",
              control = list(reltol = 1e-19, maxit = 20000))
fitted
#  (Intercept)  Sepal.Length   Sepal.Width  Petal.Length   Petal.Width  
#      7.37849      -0.24536      -2.79657       1.31364      -2.77834  
# 
# Convergence code =  0

The convergence code of 0 indicates the procedure converged, and the parameters are identical to those returned by logistic regression.

A quick look at the gmm package source (functions momentEstim.baseGmm.iterative and gmm:::.obj1 for the parameters provided) shows that the gmm package is minimizing $m'm$ as indicated above. The following equivalent code calls the R optim function directly, performing the same optimization we achieved above with the call to gmm:

gmm.objective <- function(theta, x, momentFun) {
  avg.moment <- colMeans(momentFun(theta, x))
  sum(avg.moment^2)
}
optim(init.coef, gmm.objective, x=dat, momentFun=moments,
      control = list(reltol = 1e-19, maxit = 20000))$par
#  (Intercept) Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#    7.3784866   -0.2453567   -2.7965681    1.3136433   -2.7783439

@user0 I'm not sure I can provide a comprehensive answer of the derivation; I just followed the well-known moment condition for logistic regression here. Perhaps you could ask a new question to get a canonical answer. — josliber, Mar 06 '21 at 04:41
Ok thanks. I looked a bit, and I think it is actually interestingly the expectation of the derivative of the log likelihood (derivation: https://czep.net/stat/mlelr.pdf). I guess scores/derivatives of the likelihood need to be zero to maximize the likelihood, so this makes sense sort of. I wonder though how to find these moment conditions for functions that are not easily differentiable, — sjw, Mar 07 '21 at 00:05

Using generalized method of moments (GMM) to calculate logistic regression parameter

1 Answers1