1

What is the best way of approximating a binomial distribution, given I have the following functions available:

  • normal_cdf
  • beta_cdf

see here for full list. presto-docs

Is there any other good way to approximate a binomial distribution, given I am limited to sql.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
PalimPalim
  • 249
  • 2
  • 13

1 Answers1

2

You can use normal approximation, $\mathcal{B}(n, p)$ can be approximated with $\mathcal{N}(np, np(1-p))$. So you can use normal CDF to approximate binomial CDF.

Binomial probability mass function is ${n \choose k} p^k (1-p)^{n-k}$. You can easily compute the latter part, the problem is the binomial coefficient. Recall that ${ n\choose k} = \frac{ n! }{ k! (n - k)! }$, so you can use Stirling's approximation of the factorials to approximate it. That said, you probably want to go with normal approximation rather than bothering with this in SQL.

Moreover, if I remember correctly, you can define custom functions for Presto as plugins, this sounds much better than doing it in SQL.

Tim
  • 108,699
  • 20
  • 212
  • 390