How to implement generalized hypergeometric function to use in beta-binomial cdf, sf, ppf?

Question

I'm writing a subclass of scipy.stats._distn_infrastructure.rv_discrete for the beta binomial distribution whose PMF is

$$P(X=k \mid N, \alpha, \beta){N \choose k} \frac{\mathrm{B}(k+\alpha,N-k+\beta)}{\mathrm{B}(\alpha,\beta)},$$

where $\mathrm{B}$ is the Beta function. My current implementation of the CDF and SF (survival function, equivalent to 1 - CDF) are imprecise; the strategy I employed computes the expected value of the binomial cdf with respect to the beta component:

$$P_{BB}(X \le k \mid N, \alpha, \beta) = E_p\left[P_{Binom}(X \le k \mid N, p)\right],$$ where $p \sim \mathrm{Beta}(\alpha, \beta)$. I achieve this using the scipy.stats.beta.expect method, which is not innately vectorized (it will crash on anything other than a float or 0d array).

The PPF is even worse - it's a brute force loop over the integers $k=0, \ldots, N$ such that

$$P(X\le k \mid N, \alpha, \beta) \le q.$$

According to Wikipedia, the survival function for the beta-binomial distribution is

$$P(X > k \mid N, \alpha, \beta) = \frac{\mathrm{B}(\beta+n-k-1,\alpha+k+1)_3F_2(\boldsymbol{a},\boldsymbol{b};k)} {\mathrm{B}(\alpha,\beta)\mathrm{B}(n-k,k+2) (n+1)},$$

where ${}_3F_2$ is the generalized hypergeometric function. Is there an efficient way to compute this in Python, so I can remove the reference to beta.expect? Also, how would I invert this function to solve for $k$ given $q=P(X \le k\mid N, \alpha, \beta)$?

It might help to know that for the values of $\boldsymbol{a},\boldsymbol{b}$ that (implicitly) appear here, $_3F_2(;;z)$ is a *polynomial* in $z$ (of degree $n-k-1$, $-1\le k \le n-1$). It does not simplify in general. — whuber, Aug 24 '16 at 16:14
Did you found any solution for your question? If yes, maybe you'd like to share it as an answer to your question? — Tim, Feb 20 '17 at 10:23

Tim · Answer 1 · 2016-11-06T10:14:27.380

This does not answer your question directly, but if you are thinking of estimating the cumulative distribution function of beta-binomial more efficiently, then you can use a recursive algorithm that is a little bit more efficient than the naive implementation.

Notice that probability mass function of beta-binomial distribution

$$ f(x) = {n \choose x} \frac{\mathrm{B}(x+\alpha, n-x+\beta)}{\mathrm{B}(\alpha, \beta)} $$

may be re-written if you recall that $\mathrm{B}(x,y)=\tfrac{\Gamma(x)\,\Gamma(y)}{\Gamma(x+y)}$, and $\Gamma(x) = (x-1)!$, and that ${n \choose k} = \prod_{i=1}^k \tfrac{n+1-i}{i}$, so that it becomes

$$ f(x) = \left( \prod_{i=1}^x \frac{n+1-i}{i} \right) \frac{\frac{(\alpha+x-1)!\,(\beta+n-x-1)!}{(\alpha+\beta+n-1)!}}{\mathrm{B}(\alpha,\beta)} $$

this makes updating from $x$ to $x+1$ easy

$$ f(x\color{red}{+1}) = \left( \prod_{i=1}^x \frac{n+1-i}{i} \right) \color{red}{\frac{n+1-x+1}{x+1}} \frac{\frac{(\alpha+x-1)! \,\color{red}{(\alpha+x)}\,(\beta+n-x-1)! \, \color{red}{(\beta+n-x)^{-1}}}{(\alpha+\beta+n-1)!\,\color{red}{(\alpha+\beta+n)}}}{\mathrm{B}(\alpha,\beta)} $$

and using this you can calculate cumulative distribution function as

$$ F(x) = \sum_{k=0}^x f(k) $$

using just simple arithmetic operations rather then calculating more computer-intensive functions.

Sidenote: when dealing with large numbers, you would get into numeric precision issues, so more robust code would need working with logarithms, but even though you could expect improvement in efficiency (up to 2 to 3 times faster code when I ran few benchmarks on C++ code implementing it as compared to naive implementation).

Another note. The ratio of beta integrals for the first term is another simple product $f (0)=\frac {B (a,n+b)}{B (a,b)}=\frac {\Gamma (n+b)\Gamma (a+b)}{\Gamma (n+a+b)\Gamma (b)}$ this simplifies to $\prod_{j=1}^n\frac {n+b-j}{n+a+b-j} $ — probabilityislogic, Nov 06 '16 at 10:13

How to implement generalized hypergeometric function to use in beta-binomial cdf, sf, ppf?

1 Answers1

Linked