2

I wrote some code in SAS that among other things, used the BETAINV function (or BETA.INV as it's called in Microsoft Excel) to calculate the quantile in a beta distribution corresponding to a random input value. Now I'm looking for a way to calculate this in SQL (on Netezza).

It doesn't have to be super precise, I'm more concerned about speed, as this function has to be called 2 billion times per day.

What is the use case exactly? I'm interested in estimating click-through rates, and acting upon that knowledge. In fact, I'm building a Bayesian Bandit 'machine' that optimizes click-through rates by 'exploring' the performance of different messages. See this article for more information.

So, given for example the fact that we saw 20 clicks in 2000 views, the click-through rate is 1%. Now this value is just an estimate, because we sampled only 2000 people. The 'real' CTR is somewhere around 1% but probably not exactly 1%. The probability distribution for a CTR turns out to be a beta distribution.

enter image description here

Now I want to pull a random value from this distribution (2 billion times per day). So I want most of those random values to be around the 1% in this case, but sometimes it may be lower or higher. That's exactly what the BETAINV function does. You give it a uniform random variable between 0 and 1 and it gives you the respective quantile in the beta distribution.

There's no BETAINV function in Netezza, so can I approximate this in some way (Taylor expansion?) to calculate this in a fast way?

Note: alpha and beta in my case are always integers, with alpha equal to the number of clicks and beta equal to the number of non-clicks. Alpha is always smaller than beta. Alpha can be between 1 and say 100,000. Beta can be between 1 and say 10,000,000.

Any ideas? Thanks!

TijlK
  • 73
  • 5
  • 1
    You're not talking about something called "the inverse beta distribution" (in fact the "inverse beta distribution" would be a shifted beta prime distribution). Instead you seem to be talking about the inverse of the cdf of a beta distribution, also called the quantile function of the beta distribution. [When you say "turns out to be a beta distribution", where is the derivation for this?] – Glen_b Feb 04 '15 at 13:02
  • You're correct. I changed the title accordingly. As for why click-through rate probabilities are beta distributions... the following shows a very similar reasoning: http://stats.stackexchange.com/questions/47771/what-is-the-intuition-behind-beta-distribution. I'll see if I can find an even clearer 'proof'. – TijlK Feb 04 '15 at 13:13
  • Basically this is what I'm trying to do: https://www.chrisstucchio.com/blog/2013/bayesian_bandit.html. This link clearly explains why beta distributions can be used to model CTR probabilities and explains exactly the use case I'm going for here. The author uses a Python script to get random values from the distribution. I need to do it in Netezza (SQL). – TijlK Feb 04 '15 at 13:18
  • 1
    You really should have explained in your question you were using a Bayesian approach and should probably have selected it as a tag. *That* makes it clear where the beta comes from -- at least the likelihood results in a beta. You should also discuss your prior. – Glen_b Feb 04 '15 at 13:20
  • Since the question is more about coding than statistics, shouldn`t it be moved to the sister site? – Manoel Galdino Feb 09 '15 at 02:11

0 Answers0