How do I model the probability of two empirical distributions?

Question

I have two distributions: A, and B. Each distribution is filled with the numbers 1.0-10.0. These distributions are NOT simple functions, like the gaussian, but are merely empirical counts.

Essentially, I want to create a model for the probability that any given number is A. You can imagine that this is easy using histograms; we would create 1.0 sized bins, count the number of A in that bin Bin(A), count the number of B Bin(B) in that bin, and create a new bin for that range on a new histogram with the height value being the percentage Bin(A) / (Bin(A)+Bin(B)).

My question is how to do this using continuous random distributions. Using either Python or R works fine. I feel as though there is something critical I am missing or failing to understand about this problem, because while it seems like a very trivial problem to me, I can find little information on how to solve it in either of those languages, both of which I am fairly experienced with

Some key `R` functionality to look up includes `ecdf` (*empirical cumulative distribution function*) and `density` (*kernel density estimator*). Concerning the histogram approach, please consult http://stats.stackexchange.com/a/51753 before you go any further. — whuber, Jan 09 '14 at 22:52
Right, I've tried both of those- I thought the solution would be to run something like "combined = ecdf(A) / (ecdf(A) + ecdf(B))", but this doesn't work. It doesn't seem like you can operate on ecdf objects in R? — jamesT, Jan 09 '14 at 22:54
If you have counts - or your numbers are otherwise necessarily integers (10, 1, 7...) - then you shouldn't add decimal places to them (10.0, 1.0, 7.0). It carries the implication that they might have values besides ".0". And if you have counts, you don't have continuous distributions (with continuous distributions any value will only occur once). You need to clarify what you're asking about, and I'd suggest you consider dropping the reference to particular languages, because your central problem isn't really language specific here. — Glen_b, Jan 09 '14 at 23:44

How do I model the probability of two empirical distributions?

0 Answers0