0

I have a dataset containing some hundreds of thousands of observations, out of which some small number contain an event of interest x. Let's say that my total dataset is large enough that I have a decent confidence in the overall frequency of x.

But what I'm really interested in is the frequency of x together with some other condition y, and the frequency of y in the dataset is much lower. The total number of observations of y doesn't give me enough data to make a confident prediction about how well it correlates with x, and the actual number of observations of x+y is often zero, even though the theoretical frequency of x+y must be something larger than zero.

So how can I estimate the true probability of x+y, given the overall frequency of x in the data set and the small-ish number of instances of y that I have?

Edit: I know that x and y are not independent, but at the outset I don't know anything about the nature of the relationship between them. The entire point of the exercise is to determine whether they have a positive or negative correlation.

Sorry, I know next to nothing about statistics and I don't know what the proper terminology is to describe this situation.

JSBձոգչ
  • 101
  • 1
  • Does this https://stats.stackexchange.com/questions/107574/strategy-to-deal-with-rare-events-logistic-regression or this https://stats.stackexchange.com/questions/134380/how-to-tell-the-probability-of-failure-if-there-were-no-failures help? – Tim Nov 24 '17 at 11:12
  • @Tim, it helps, especially https://stats.stackexchange.com/a/134385/185985 since Bayes' Theorem is about the only part of statistics that I *am* familiar with. – JSBձոգչ Nov 24 '17 at 11:29
  • Would you say that it answers your question? If yes, we may close it as duplicate. If no, maybe you could make it more precise what does it not answer? – Tim Nov 24 '17 at 11:30
  • @Tim I'll look at it and return in a day or so if I have more questions. – JSBձոգչ Nov 24 '17 at 11:32

0 Answers0