What are the assumptions of negative binomial regression?

Question

I'm working with a large data set (confidential, so I can't share too much), and came to the conclusion a negative binomial regression would be necessary. I've never done a glm regression before, and I can't find any clear information about what the assumptions are. Are they the same for MLR?

Can I transform the variables the same way (I've already discovered transforming the dependent variable is a bad call since it needs to be a natural number)? I already determined that the negative binomial distribution would help with the over-dispersion in my data (variance is around 2000, the mean is 48).

Thanks for the help!!

Glen_b · Accepted Answer · 2016-05-17T09:05:58.513

I'm working with a large data set (confidential, so I can't share too much),

It might be possible to create a small data set that has some of the general characteristics of the real data without either the variable names nor any of the actual values.

and came to the conclusion a negative binomial regression would be necessary. I've never done a glm regression before, and I can't find any clear information about what the assumptions are. Are they the same for MLR?

Clearly not! You already know you're assuming response is conditionally negative binomial, not conditionally normal. (Some assumptions are shared. Independence for example.)

Let me talk about GLMs more generally first.

GLMs include multiple regression but generalize in several ways:

1) the conditional distribution of the response (dependent variable) is from the exponential family, which includes the Poisson, binomial, gamma, normal and numerous other distributions.

2) the mean response is related to the predictors (independent variables) through a link function. Each family of distributions has an associated canonical link function - for example in the case of the Poisson, the canonical link is the log. The canonical links are almost always the default, but in most software you generally have several choices within each distribution choice. For the binomial the canonical link is the logit (the linear predictor is modelling $\log(\frac{p}{1-p})$, the log-odds of a success, or a "1") and for the Gamma the canonical link is the inverse - but in both cases other link functions are often used.

So if your response was $Y$ and your predictors were $X_1$ and $X_2$, with a Poisson regression with the log link you might have for your description of how the mean of $Y$ is related to the $X$'s:

$\text{E}(Y_i) = \mu_i$

$\log\mu_i= \eta_i$ ($\eta$ is called the 'linear predictor', and here the link function is $\log$, the symbol $g$ is often used to represent the link function)

$\eta_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}$

3) the variance of the response is not constant, but operates through a variance-function (a function of the mean, possibly times a scaling parameter). For example, the variance of a Poisson is equal to the mean, while for a gamma it's proportional to the square of the mean. (The quasi-distributions allow some degree of decoupling of Variance function from assumed distribution)

--

So what assumptions are in common with what you remember from MLR?

Independence is still there.
Homoskedasticity is no longer assumed; the variance is explicitly a function of the mean and so in general varies with the predictors (so while the model is generally heteroskedastic, the heteroskedasticity takes a specific form).
Linearity: The model is still linear in the parameters (i.e. the linear predictor is $X\beta$), but the expected response is not linearly related to them (unless you use the identity link function!).
The distribution of the response is substantially more general

The interpretation of the output is in many ways quite similar; you can still look at estimated coefficients divided by their standard errors for example, and interpret them similarly (they're asymptotically normal - a Wald z-test - but people still seem to call them t-ratios, even when there's no theory that makes them $t$-distributed in general).

Comparisons between nested models (via 'anova-table' like setups) are a bit different, but similar (involving asymptotic chi-square tests). If you're comfortable with AIC and BIC these can be calculated.

Similar kinds of diagnostic displays are generally used, but can be harder to interpret.

Much of your multiple linear regression intuition will carry over if you keep the differences in mind.

Here's an example of something you can do with a glm that you can't really do with linear regression (indeed, most people would use nonlinear regression for this, but GLM is easier and nicer for it) in the normal case - $Y$ is normal, modelled as a function of $x$:

$\text{E}(Y) = \exp(\eta) = \exp(X\beta) = \exp(\beta_0+\beta_1 x)$ (that is, a log-link)

$\text{Var}(Y) = \sigma^2$

That is, a least-squares fit of an exponential relationship between $Y$ and $x$.

Can I transform the variables the same way (I've already discovered transforming the dependent variable is a bad call since it needs to be a natural number)?

You (usually) don't want to transform the response (DV). You sometimes may want to transform predictors (IVs) in order to achieve linearity of the linear predictor.

I already determined that the negative binomial distribution would help with the over-dispersion in my data (variance is around 2000, the mean is 48).

Yes, it can deal with overdispersion. But take care not to confuse the conditional dispersion with the unconditional dispersion.

Another common approach - if a bit more kludgy and so somewhat less satisfying to my mind - is quasi-Poisson regression (overdispersed Poisson regression).

With the negative binomial, it's in the exponential family if you specify a particular one of its parameters (the way it's usually reparameterized for GLMS at least). Some packages will fit it if you specify the parameter, others will wrap ML estimation of that parameter (say via profile likelihood) around a GLM routine, automating the process. Some will restrict you to a smaller set of distributions; you don't say what software you might use so it's difficult to say much more there.

I think usually the log-link tends to be used with negative binomial regression.

There are a number of introductory-level documents (readily found via google) that lead through some basic Poisson GLM and then negative binomial GLM analysis of data, but you may prefer to look at a book on GLMs and maybe do a little Poisson regression first just to get used to that.

+1 I agree with COOLSerdash. Lots of good information here! In addition to the recommended Google search, I'd specifically recommend a textbook called Econometrics by Example by Gujarati. Chapter 12 covers the Poisson regression model and the negative-Binomial regression model. As the title of the book suggests, there are examples. Data used in the book is available from the books companion website and so to is a [summary of Chapter 12](http://www.palgrave.com/economics/gujarati/students/chaptersummaries/chapter12.html) itself. I recommend that the OP checks this out. — Graeme Walsh, Jun 27 '13 at 08:00
I'm late to the party... but this answer helped me understand generalized linear models better than a whole stack of books at the library. — haff, Sep 19 '17 at 07:48

Todd D. Johnson · Answer 2 · 2019-02-25T04:54:41.787

Some references I have found to be helpful in analyzing data with the negative binomial distribution specifically (including listing assumptions) and GLM/GLMMs generally are:

Bates, D.M., B. Machler, B. Bolker, and S. Walker. 2015. Fitting linear mixed-effects models using lme4. J. Stat. Software 67: 1-48.

Bolker, B.M., M.E. Brooks, C.J. Clark, S.W. Geange, J.R. Poulsen, M.H.H. Stevens, and J. White. Generalized linear mixed models: a practical guide for ecology and evolution. Trends in Ecology and Evolution 127-135.

Zeileis A. , C. Keleiber C, and S. Jackman 2008. Regression models for count data in R. J. Stat. Software. 27: 1-25

Zuur A.F., E.N. Iene , N. Walker, A.A. Saveliev, and G.M. Smith. 2009. Mixed effects models and extensions in ecology with R. Springer, NY, USA.

What are the assumptions of negative binomial regression?

2 Answers2

Linked