14

This is probably trivial but I couldn't figure it out. I want to fit a logistic regression model, where my dependent variable is not a Bernoulli variable, but a binomial count. Namely, for each $X_i$, I have $s_i$, the number of successes, and $n_i$, the number of trials. This is completely equivalent to the Bernoulli case, as if we observed these $n_i$ trials, so in principle I can use, e.g., statsmodels logistic regression after I unravel my data to be Bernoulli observations. Is there a simpler way?

R S
  • 507
  • 1
  • 5
  • 15
  • 3
    GLM with family=Binomial estimates the count model where the dependent variable is the number of successes and failures. – Josef Apr 19 '16 at 18:32

2 Answers2

7

The statsmodel package has glm() function that can be used for such problems. See an example below:

import statsmodels.api as sm

glm_binom = sm.GLM(data.endog, data.exog, family=sm.families.Binomial())

More details can be found on the following link. Please note that the binomial family models accept a 2d array with two columns. Each observation is expected to be [success, failure]. In the above example that I took from the link provided below, data.endog corresponds to a two dimensional array (Success: NABOVE, Failure: NBELOW).

Relevant documentation: https://www.statsmodels.org/stable/examples/notebooks/generated/glm.html

Vishal
  • 1,134
  • 9
  • 14
2

Alternatively using R-style formula

import statsmodels.api as sm
import statsmodels.formula.api as smf

mod = smf.glm('successes + failures ~ X1 + X2', family=sm.families.Binomial(), data=df).fit()
mod.summary()
```
Rems
  • 121
  • 2