Error in fitting negative binomial regression model in R when replicating published results (works in Stata)

Question

I'm trying to replicate the results of the first model of this article:

Hultman, Lisa, Jacob Kathman, and Megan Shannon. 2013. “United Nations Peacekeeping and Civilian Protection in Civil War.” American Journal of Political Science 57(4): 875–91.

Replication material can be found here: http://thedata.harvard.edu/dvn/dv/ajps/faces/study/StudyPage.xhtml?studyId=87987&tab=files

The Stata code provided in the do file runs without problems. However, when I try to replicate the model in R. The error strongly resembles the one in this CV question. Here's the R code I've written to replicate the results:

library(MASS)
library(foreign)

pko <- read.dta("HKS_AJPS_2013.dta")

pko_model1 <- glm.nb(osvAll ~ troopLag + 
                       policeLag + 
                       militaryobserversLag + 
                       brv_AllLag + 
                       osvAllLagDum  + 
                       incomp + 
                       epduration + 
                       lntpop, data = pko, link = log)

This produces the following error:

  Error in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset,  : 
  NA/NaN/Inf in 'x'

If I include the control=glm.control(trace=10,maxit=100) option in glm.nb it produces the following output:

Deviance = 75787029 Iterations - 1
Deviance = 28247900 Iterations - 2
Deviance = 11043902 Iterations - 3
Deviance = 4952253 Iterations - 4
Deviance = 2896062 Iterations - 5
Deviance = 2286069 Iterations - 6
Deviance = 2152722 Iterations - 7
Deviance = 2135621 Iterations - 8
Deviance = 2134804 Iterations - 9
Deviance = 2134801 Iterations - 10
Deviance = 2134801 Iterations - 11
theta.ml: iter 0 'theta = 0.000609'
theta.ml: iter1 theta =0.00120256
theta.ml: iter2 theta =0.00234778
theta.ml: iter3 theta =0.00449211
theta.ml: iter4 theta =0.0082914
theta.ml: iter5 theta =0.0143798
theta.ml: iter6 theta =0.0225089
theta.ml: iter7 theta =0.0302781
theta.ml: iter8 theta =0.0342821
theta.ml: iter9 theta =0.0349533
theta.ml: iter10 theta =0.0349683
Initial value for 'theta': 0.034968
Deviance = 3634.951 Iterations - 1
Deviance = 1160161 Iterations - 2
Error in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset,  : 
  NA/NaN/Inf in 'x'

If I exclude the epduration variable or the incomp variable, the error disappears and I can roughly replicate the results from the article, but the parameter estimates of course vary because I don't include all variables in the model (and I don't use robust, clustered standard errors in R).

Two questions:

Why does this run in Stata without any complaint, but not in R?
How can I make this work in R? The answers to this question suggest a possible solution by first estimating a Poisson regression and then feeding the results as starting parameter values into an MLE estimation. Yet, I haven't been able to make this work in R.

I realize that this might be a duplicate to the existing CrossValidated question but since I'm having a similar problem with different data, this might be a more general problem.

What happens in `R` when you initialize the solver with the solution found by Stata? — whuber, Apr 10 '14 at 17:19
If use the inverse of Stata's `alpha` as `init.theta` for `glm.n`, I'm getting the following: `Deviance = 69533.02 Iterations - 1 Deviance = NaN Iterations - 2 Step halved: new deviance = 2458521 Error in glm.fitter(x = X, y = Y, w = w, start = start, etastart = etastart, : NA/NaN/Inf in 'x' In addition: Warning message: step size truncated due to divergence` — Felix, Apr 11 '14 at 08:17
I was able to fit this model as well as the other model referenced above using AD Model Builder. If there is any interest I can post the techniques used etc. It appears that the data contain a few really bad points. Also the paper refers to the log pseudo likelihood which makes me wonder if they really fit a negative binomial model via maximum likelihood. — dave fournier, Oct 29 '15 at 14:21
@davefournier Thanks for your comment. Yes, I'm still interested in this. I've seen the model replicated in R [here](https://github.com/carlislerainey/meaningful-inferences/blob/master/hks-replication/R/hks-replication.R). The author provides a different `init.theta` and provides some arguments to `glm.control` which I don't quite understand. Any comments would be much appreciated. — Felix, Oct 30 '15 at 10:13
Might some of the calculated values be zero? Perhaps Stata uses the equivalent of R's `log1p` rather than `log` as the link function in this situation? — Wayne, Nov 29 '15 at 18:54

score 3 · Accepted Answer · edited Dec 06 '15 at 13:41

This answer did not seem to be too popular since it was not be done in R. For better or worse a lot of people seem to be restricted to R. I recall that I helped produce a package, glmmADMB based on AD Model builder, to do among other things negative binomial mixed models in R. I was curious to see if the software could handle the special case of zero random effects i.e not a mixed model. The results look pretty good so I think that one could use glmmADMB to fit negative binomial (or ZINB) models in R even when there are no random effects. Of source this does not change the fact that the model does not appear to fit the data very well. The R script is

 # load packages
  library(foreign)
  library(MASS)
  library(sandwich)
  library(compactr)
  library(glmmADMB)



 # read data
  d <- read.dta("HKS_AJPS_2013.dta")
  keep <- c("osvAll", "troopLag", "policeLag", "militaryobserversLag",
            "brv_AllLag", "osvAllLagDum", "incomp", "epduration",      
  "lntpop",          "conflict_id")
  d <- na.omit(d[, keep])

  # estimate model
  mm <- glmmadmb(osvAll ~ troopLag + policeLag + militaryobserversLag +
         brv_AllLag + osvAllLagDum + incomp + epduration +
         lntpop,
       data = d,family="nbinom",save.dir="/home/xdave/tmp")

It appears to get the solution without any problem. The output is

GLMM's in R powered by AD Model Builder:

 Family: nbinom 
alpha = 0.059194 
link = log 

Fixed effects:
 Log-likelihood: -6260.15 
AIC: 12540.3 
Formula: osvAll ~ troopLag + policeLag + militaryobserversLag +     
brv_AllLag +      osvAllLagDum + incomp + epduration + lntpop 
     (Intercept)             troopLag            policeLag 
   -9.2367811129        -0.5304458388        -9.9025712540 
militaryobserversLag           brv_AllLag         osvAllLagDum 
   21.7617746170         0.0007062001         2.1773566196 
          incomp           epduration               lntpop 
    2.3793932712        -0.0005591779         0.7031090086 

Number of observations: total=3746

I also fit the alternative variance model nbinom1

GLMM's in R powered by AD Model Builder:

Family: nbinom 
alpha = 403.43 
link = log 

Fixed effects:
Log-likelihood: -6154.7 
AIC: 12329.4 
Formula: osvAll ~ troopLag + policeLag + militaryobserversLag + 
brv_AllLag +      osvAllLagDum + incomp + epduration + lntpop 
     (Intercept)             troopLag            policeLag 
   -3.3301969278        -0.0907711533        -0.0114299626 
militaryobserversLag           brv_AllLag         osvAllLagDum 
    1.3788240938         0.0002516249         2.1600171443 
          incomp           epduration               lntpop 
    1.0211055520         0.0017457421         0.3582247455 

Number of observations: total=3746

The AD Model Builder solution for the standard parameterization of the overdispersion is

# Number of parameters = 10  Objective function value = 6260.19  Maximum   gradient component = 9.67761e-05


index   name    value      std.dev
10   log_b   2.8269e+00 3.8542e-02
11   true_a -9.2363e+00 8.9969e-01
12   true_a -5.3036e-01 6.6560e-02
13   true_a -9.9021e+00 1.0027e+00
14   true_a  2.1760e+01 1.3643e+00
15   true_a  7.0615e-04 7.8071e-04
16   true_a  2.1772e+00 1.7710e-01
17   true_a  2.3793e+00 1.8294e-01
18   true_a -5.5966e-04 1.5160e-03
19   true_a  7.0307e-01 7.8810e-02

For theta in glm.nb the estimate is 1/exp(log_b)=0.059604. So that explains how the r code got the answer. It was started at the correct value of theta. For ADMB I did not use any special value for the equivalent parameter.

Now glm.nb uses the paramaterization 1+mu/theta for the overdispersion. This is done mostly because it fits in with the glm approach, not because it is the only way to parameterize the overdispersion. Another way is to use tau for the overdispersion ie the variance is tau * mu rather than mu * (1+mu/theta). I also fit this model. The parameter estimates are

# Number of parameters = 10  Objective function value = 5946.81  Maximum gradient component = 0.000213673

10   log_tau  7.3118e+00 8.4787e-02
11   true_a  -1.7793e+00 5.1516e-01
12   true_a  -8.5957e-02 3.1555e-02
13   true_a   1.0275e-01 4.0541e-01
14   true_a   1.1965e+00 4.2732e-01
15   true_a   2.2390e-04 1.4529e-04
16   true_a   2.0814e+00 8.2490e-02
17   true_a   9.2989e-01 1.0800e-01
18   true_a   1.8261e-03 5.8188e-04
19   true_a   3.2689e-01 4.1092e-02

The objective function is the negative log-likelihood so that the log-likelihood difference is over 300 better.

The parameter estimates appear quite different. At this point I think one should investigate the scaled squared residuals. I think it is remarkable that this paper was accepted for publication without any discussion over the lack of fit when the model appears to be so inadequate. These are the worst fitting observations. Note that they depend somehwat on how the overdispersion is parameterized

 index  obs  squared r    mean      tau
 1408    494 7.35736e+00  20.34180 1499.06
  651    1981 8.09487e+00 247.60800 1499.06
  87     2349 8.83926e+00 312.87700 1499.06
2094   2320 1.80316e+01 170.87200 1499.06
2060   4250 4.79849e+01 225.19800 1499.06
  81     7971 1.31481e+02 298.65700 1499.06
  80     8586 1.53705e+02 298.11300 1499.06
  79    14904 4.78280e+02 297.56900 1499.06
 2069   4016 6.12226e+02  17.42130 1499.06
2057 145844 5.95569e+05  23.81690 1499.06

For the R parameterization of the overdispersion the biggest residuals are

2042    303 2.17909e+02 4.88395e+00 8.35072e+01
3035    200 2.69942e+02 2.88942e+00 4.98125e+01
1694    200 2.78641e+02 2.84416e+00 4.90479e+01
1928     52 2.90467e+02 7.03287e-01 1.28810e+01
3021    226 3.47482e+02 2.88266e+00 4.96982e+01
  79    14904 3.49259e+02 1.91507e+02 3.23623e+03
 238     413 3.87830e+02 5.01092e+00 8.56521e+01
1934     64 4.45157e+02 7.00931e-01 1.28412e+01
2057 145844 9.31029e+02 1.15368e+03 1.94908e+04
1686    370 9.89133e+02 2.81110e+00 4.84895e+01
1408    494 2.25980e+03 2.48617e+00 4.30002e+01
3348    494 2.35199e+03 2.43663e+00 4.21632e+01
2918    147 2.56461e+03 6.74018e-01 1.23865e+01

Feedback welcome.

Thanks, Dave! That's certainly helpful. If understand you correctly, you modeled the overdispersion parameter differently than `glm.nb`which gave you different parameter estimates from the ones reported in the article, right? I'm assuming that this is also partially the answer to my original question: Stata's `nbreg` is modeling the overdispersion parameter differently than `glm.nb`, right? — Felix, Nov 05 '15 at 08:57
But it turns out that how the overdispersion is modelled is not the real issue. there are some huge outliers which probably dominate the fit and these should be investigated in context and something done about them. — dave fournier, Nov 05 '15 at 17:30
For future reference there is now an updated R package on CRAN called `glmmTMB` that I think supersedes `glmmADMB` — Will Townes, Aug 25 '19 at 02:32

Error in fitting negative binomial regression model in R when replicating published results (works in Stata)

1 Answers1