I'm trying to replicate the results of the first model of this article:
Hultman, Lisa, Jacob Kathman, and Megan Shannon. 2013. “United Nations Peacekeeping and Civilian Protection in Civil War.” American Journal of Political Science 57(4): 875–91.
Replication material can be found here: http://thedata.harvard.edu/dvn/dv/ajps/faces/study/StudyPage.xhtml?studyId=87987&tab=files
The Stata code provided in the do file runs without problems. However, when I try to replicate the model in R. The error strongly resembles the one in this CV question. Here's the R code I've written to replicate the results:
library(MASS)
library(foreign)
pko <- read.dta("HKS_AJPS_2013.dta")
pko_model1 <- glm.nb(osvAll ~ troopLag +
policeLag +
militaryobserversLag +
brv_AllLag +
osvAllLagDum +
incomp +
epduration +
lntpop, data = pko, link = log)
This produces the following error:
Error in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
NA/NaN/Inf in 'x'
If I include the control=glm.control(trace=10,maxit=100)
option in glm.nb
it produces the following output:
Deviance = 75787029 Iterations - 1
Deviance = 28247900 Iterations - 2
Deviance = 11043902 Iterations - 3
Deviance = 4952253 Iterations - 4
Deviance = 2896062 Iterations - 5
Deviance = 2286069 Iterations - 6
Deviance = 2152722 Iterations - 7
Deviance = 2135621 Iterations - 8
Deviance = 2134804 Iterations - 9
Deviance = 2134801 Iterations - 10
Deviance = 2134801 Iterations - 11
theta.ml: iter 0 'theta = 0.000609'
theta.ml: iter1 theta =0.00120256
theta.ml: iter2 theta =0.00234778
theta.ml: iter3 theta =0.00449211
theta.ml: iter4 theta =0.0082914
theta.ml: iter5 theta =0.0143798
theta.ml: iter6 theta =0.0225089
theta.ml: iter7 theta =0.0302781
theta.ml: iter8 theta =0.0342821
theta.ml: iter9 theta =0.0349533
theta.ml: iter10 theta =0.0349683
Initial value for 'theta': 0.034968
Deviance = 3634.951 Iterations - 1
Deviance = 1160161 Iterations - 2
Error in glm.fitter(x = X, y = Y, w = w, etastart = eta, offset = offset, :
NA/NaN/Inf in 'x'
If I exclude the epduration
variable or the incomp
variable, the error disappears and I can roughly replicate the results from the article, but the parameter estimates of course vary because I don't include all variables in the model (and I don't use robust, clustered standard errors in R).
Two questions:
- Why does this run in Stata without any complaint, but not in R?
- How can I make this work in R? The answers to this question suggest a possible solution by first estimating a Poisson regression and then feeding the results as starting parameter values into an MLE estimation. Yet, I haven't been able to make this work in R.
I realize that this might be a duplicate to the existing CrossValidated question but since I'm having a similar problem with different data, this might be a more general problem.