Say I have a distribution, either described by a probability density function that is integrable and continuous, or by a set of discrete probabilities over a finite set of symbols.
I want to efficiently generate samples from this distribution. I don't care about references to other algorithms, I want to know if this idea is essentially correct, or how to improve it.
Here is how I currently do it for the discrete case :
- form the cumulative probability function ( defined at the ith symbol as the sum of all probabilities from the first symbol to the ith symbol ).
- Then, there is an interval, [0,max(CDF)] and it is trivial to output a IID random number anywhere in that interval using a regular PRNG.
- Since every symbol is now associated with an interval, we output the symbol in whose interval the IID number landed.
It seems completely obvious to me that the output sequence will have the characteristics of the given discrete distribution, and is optimal up to the original PRNG.
I want now to extend to the continuous case (this may be the Ziggurat algorithm, and if so, boy am I happy I jumped straight to that). But can we not simply integrate the PDF to get a CDF, again form the interval [0,max(CDF)] and again shoot IID randoms at this interval, and again, and this is the tricky part for me, associate each sample symbol (of which there are infinitely many) with an interval. Then output the sample symbol in whose interval the IID number landed.
So, my question is : how can I associate the sample symbols, with intervals? Presumably by discretizing (putting into buckets) the sample symbol interval, and giving anything in a bucket the same probability. The size of buckets could be worked out so as to, over the intended output stream length, not cause any detectable deviation from the underlying continuous distribution.
Also, I am doing this in JavaScript. It's a DIY so I am not interested in trusting or using someone else's code. But nor am I interested in spending 2 days on it. I'm asking here as I am not an expert in distributions, and I want to have the ideas corrected before I try to implement this.