1

I would like to estimate a probability density of a given probability/proportions variable.

Consider estimating the click through rate density (CTR - number of times image clicked/number of times viewed) across web sites: eg what is the probability that a randomly selected web site has CTR <1%.

Now the data available is number of times each site is viewed and clicked/not clicked.

The problem I would like to address is the unequal number of viewings of a site, together with the probability (0-1 limit) constraint of the data.

So adjusting the kernel width (in some way) according to site viewings will address the first priblem, but doesn't deal with the second (AFAIK the kernel will typically be a gaussian) and I could end up with a pdf that extends beyond the 0 - 1 range.
What I was imagining was something more like each (repeated) observation [site] having a beta distribution (according to number of observations and success rate). Perhaps an alternative approach would be using KDE on log odds?

Estimating probability distributions of probabilities seems a relatively standard problem. Is there some standard algorithm ( and implementation in R?)

(see eg Estimating probability or frequency with low N? which is perhaps related)

I have found the following paper discussing this problem Probit transformation for kernel density estimation on the unit interval and reviewing the methods I suggested.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
seanv507
  • 4,305
  • 16
  • 25
  • Do you have the numerator/denominator counts from which the rate was calculated? You are better off constructing a model from them! Then you cpuld simply use a logistic regression, corrected for overdispersion. – kjetil b halvorsen Feb 25 '17 at 14:40
  • @kjetil I am not trying to predict site ctrs, I am trying to get a distribution. Think histograms! – seanv507 Feb 25 '17 at 18:46

0 Answers0