This is probably trivial but I couldn't figure it out. I want to fit a logistic regression model, where my dependent variable is not a Bernoulli variable, but a binomial count. Namely, for each $X_i$, I have $s_i$, the number of successes, and $n_i$, the number of trials. This is completely equivalent to the Bernoulli case, as if we observed these $n_i$ trials, so in principle I can use, e.g., statsmodels logistic regression after I unravel my data to be Bernoulli observations. Is there a simpler way?
Asked
Active
Viewed 1.1k times
14
-
3GLM with family=Binomial estimates the count model where the dependent variable is the number of successes and failures. – Josef Apr 19 '16 at 18:32
2 Answers
7
The statsmodel package has glm() function that can be used for such problems. See an example below:
import statsmodels.api as sm
glm_binom = sm.GLM(data.endog, data.exog, family=sm.families.Binomial())
More details can be found on the following link. Please note that the binomial family models accept a 2d array with two columns. Each observation is expected to be [success, failure]. In the above example that I took from the link provided below, data.endog
corresponds to a two dimensional array (Success: NABOVE, Failure: NBELOW).
Relevant documentation: https://www.statsmodels.org/stable/examples/notebooks/generated/glm.html
-
1Vishal.. I think you should put in your answer that you provide a Nx2 matrix for the dependent variable with the counts – seanv507 Apr 21 '16 at 05:53
-
2
Alternatively using R-style formula
import statsmodels.api as sm
import statsmodels.formula.api as smf
mod = smf.glm('successes + failures ~ X1 + X2', family=sm.families.Binomial(), data=df).fit()
mod.summary()
```

Rems
- 121
- 2