I'm writing a subclass of scipy.stats._distn_infrastructure.rv_discrete
for the beta binomial distribution whose PMF is
$$P(X=k \mid N, \alpha, \beta){N \choose k} \frac{\mathrm{B}(k+\alpha,N-k+\beta)}{\mathrm{B}(\alpha,\beta)},$$
where $\mathrm{B}$ is the Beta function. My current implementation of the CDF and SF (survival function, equivalent to 1 - CDF) are imprecise; the strategy I employed computes the expected value of the binomial cdf with respect to the beta component:
$$P_{BB}(X \le k \mid N, \alpha, \beta) = E_p\left[P_{Binom}(X \le k \mid N, p)\right],$$
where $p \sim \mathrm{Beta}(\alpha, \beta)$. I achieve this using the scipy.stats.beta.expect
method, which is not innately vectorized (it will crash on anything other than a float or 0d array).
The PPF is even worse - it's a brute force loop over the integers $k=0, \ldots, N$ such that
$$P(X\le k \mid N, \alpha, \beta) \le q.$$
According to Wikipedia, the survival function for the beta-binomial distribution is
$$P(X > k \mid N, \alpha, \beta) = \frac{\mathrm{B}(\beta+n-k-1,\alpha+k+1)_3F_2(\boldsymbol{a},\boldsymbol{b};k)} {\mathrm{B}(\alpha,\beta)\mathrm{B}(n-k,k+2) (n+1)},$$
where ${}_3F_2$ is the generalized hypergeometric function. Is there an efficient way to compute this in Python, so I can remove the reference to beta.expect
? Also, how would I invert this function to solve for $k$ given $q=P(X \le k\mid N, \alpha, \beta)$?