Computing different types of negative binomial regression

Question

I want to compare different Count Data Models. How does one compute NEGBIN type I and NEGBIN type II in R?

Which model is estimated by glm.nb()?

Thanks in advance

Isabella Ghement · Accepted Answer · 2019-07-26T20:34:51.243

Are you referring to regression models for count data? If yes, I think your best bet is to use the R packages gamlss and gamlss.dist.

The gamlss package contains a gamlss() function which you can use to fit your regression models.

The gamlss.dist package contains two functions which implement the distributions you need:

NBI() implements the Negative Binomial Type I distribution;
NBII() implements the Negative Binomial Type II distribution.

The help files for the NBI() and NBII() functions provide details on the actual parametrizations they use. According to these help files, the corresponding parametrizations are such that the mean and variance of a variable having a negative binomial distribution are given by:

NBI(): mean = $\mu$ and variance = $\mu + \sigma\mu^2$

NBII(): mean = $\mu$ and variance = $\mu + \sigma\mu$

In contrast, the glm.nb() function in the MASS package uses a parametrization for which the mean and variance of a variable having a negative binomial distribution are given by:

mean = $\mu$ and variance = $\mu + \mu^2/\theta$.

If we denote $1/\theta$ by $\sigma$ in the variance parametrization used by glm.nb(), we recover the same variance parametrization as that used by the NBI() function: $\mu + \mu^2/\theta$ = $\mu + \sigma\mu^2$.

Note that the default of $glm.nb()$ is to model $log(\mu)$ and report the estimated value of $\theta$. On the other hand, using gamlss() with the option family=NBI will amount to modelling $log(\mu)$ and reporting the estimated value of $log(\sigma)$. If you compute the estimated value of $log(1/\theta)$ based on the summary of the glm.nb model, you should get the estimated value of $log(\sigma)$ reported in the summary of the model obtained by using gamlss() with the option family=NBI.

To fit a negative binomial regression model within the gamlss framework, you would use R code like the one below:

# NBI 
model.NBI <- gamlss(y ~ x1 + x2 ,data=data,family=NBI)

# NBII 
model.NBII <- gamlss(y ~ x1 + x2 ,data=data,family=NBII)

For glm.nb, the model would look like this:

model.nb <- glm.nb(y ~ x1 + x2 ,data=data)

There's a comment elsewhere on this forum (see Why do different negative binomial regression functions produce different coefficients, p-values) which states:

"From memory, for some reason gamlss uses NBI to refer to the distribution commonly called NB2 (quadratic mean-variance relationship), and NBII for the distribution commonly called NB1 (linear mean-variance relationship). glm.nb fits NB2 only. Try using family = NBI in gamlss. – Mark May 11 '17 at 23:50"

This comment acts as a reminder to read the gamlss documentation really carefully to make sure you fully understand what parametrizations are used for the two versions of the Negative Binomial distributions.

Here's an example using data pulled from https://stats.idre.ucla.edu/r/dae/negative-binomial-regression/ which shows the connection between glm.nb and gamlss with NBI parametrization:

require(foreign)

dat <- read.dta("https://stats.idre.ucla.edu/stat/stata/dae/nb_data.dta")

dat <- within(dat, {
             prog <- factor(prog, levels = 1:3, 
                            labels = c("General", "Academic", "Vocational"))
             id <- factor(id)
              })

 #--- glm.nb 

 m1 <- glm.nb(daysabs ~ math + prog, data = dat)

 summary(m1)

 #--- gamlss with NBI family

 require(gamlss)
 require(gamlss.dist)

 m1nb1 <- gamlss(daysabs ~ math + prog, data = dat, family=NBI)

 summary(m1nb1)

The summary of the glm.nb model for these data is as follows:

> summary(m1)

Call:
glm.nb(formula = daysabs ~ math + prog, 
       data = dat, 
       init.theta = 1.032713156, 
       link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
 -2.1547  -1.0192  -0.3694   0.2285   2.5273  

Coefficients:
                Estimate Std. Error z value             Pr(>|z|)    
(Intercept)     2.615265   0.197460  13.245 < 0.0000000000000002 ***
math           -0.005993   0.002505  -2.392               0.0167 *  
progAcademic   -0.440760   0.182610  -2.414               0.0158 *  
progVocational -1.278651   0.200720  -6.370       0.000000000189 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(1.0327) family taken to be 1)

    Null deviance: 427.54  on 313  degrees of freedom
Residual deviance: 358.52  on 310  degrees of freedom
AIC: 1741.3

Number of Fisher Scoring iterations: 1


              Theta:  1.033 
          Std. Err.:  0.106 

 2 x log-likelihood:  -1731.258

The summary of the gamlss model with the NBI family option is:

> summary(m1nb1)
******************************************************************
Family:  c("NBI", "Negative Binomial type I") 

Call:  gamlss(formula = daysabs ~ math + prog, family = NBI, data = dat) 

Fitting method: RS() 

------------------------------------------------------------------
Mu link function:  log
Mu Coefficients:
                Estimate Std. Error t value             Pr(>|t|)    
(Intercept)     2.615322   0.196364  13.319 < 0.0000000000000002 ***
math           -0.005994   0.002507  -2.391               0.0174 *  
progAcademic   -0.440756   0.182579  -2.414               0.0164 *  
progVocational -1.278743   0.201968  -6.331       0.000000000855 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

------------------------------------------------------------------
Sigma link function:  log
Sigma Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.03219    0.10279  -0.313    0.754

------------------------------------------------------------------
No. of observations in the fit:  314 
Degrees of Freedom for the fit:  5
      Residual Deg. of Freedom:  309 
                  at cycle:  2 

Global Deviance:     1731.258 
            AIC:     1741.258 
            SBC:     1760.005 
******************************************************************

If you take the estimated Theta value of 1.033 reported by glm.nb and compute log(1/Theta) = log(1/1.033) = -0.03246719, which is close to what gamlss reports for log(sigma).

This Cross Validated forum answer by Hilbe is also illuminating: What is theta in a negative binomial regression fitted with R?

In it, Hilbe explains that glm.nb uses the indirect parametrization for expressing the variance of a variable having a negative binomial distribution as a quadratic function of the mean value of that variable:

variance = $\mu$ + $\mu^2/\theta$.

The direct parametrization corresponding to the same setting would be:

variance = $\mu$ + $\alpha\mu^2$.

Note that Hilbe uses the notation $\alpha$ instead of the notation $\sigma$ used in gamlss. In his notation, $\alpha = 1/\theta$.

In his Negative Binomial book, Joseph Hilbe states that "NB2 is the standard form of negative binomial used to estimate data that are Poisson-overdispersed, and is the form of the model which most statisticians understand by negative binomial. NB2 is typically the first model we turn to when we discover that a Poisson model is overdispersed." (You will need to read Hilbe's book to understand what parametrization he had in mind when making this statement.)

Computing different types of negative binomial regression

1 Answers1