0

I'm having some problems plotting my data, and understanding what I can do with it

I have a dataset that looks something like:

| Event | Score | Prob |
+-------+-------+------+
|     1 |     5 | 30%  |
|     2 |     2 | 90%  |
|     3 |     1 | 20%  |
|     4 |     9 | 30%  |
|   ... |       |      |
+-------+-------+------+

Each event has a probability of happening, and a score associated with it. If more than 1 event occurs, then the scores sum.

I would like to make a plot that shows the most 'likely' score that could be achieved, and a curve that shows the distribution of scores against total probability. I feel this should be possible as I have the probabilities for each event, however, I also feel like I am misunderstanding something.

Can anyone please advise me on what to do, or any resources I should read to understand my problem better? Thanks.

Andrew
  • 136
  • 7
  • Is it intentional that two events can happen at the same time? – Sebastian Oct 08 '19 at 09:31
  • yes. So, there could be a situation where 1,2 and 4 happen at the same time. – Andrew Oct 08 '19 at 09:38
  • Might find your answer here https://stats.stackexchange.com/questions/5347/how-can-i-efficiently-model-the-sum-of-bernoulli-random-variables and here https://en.wikipedia.org/wiki/Poisson_binomial_distribution. – Art Oct 08 '19 at 09:46
  • Is it intended that the events are occurring independently of each other? – Glen_b Oct 08 '19 at 11:16

1 Answers1

0

You could work out the mean score by multiplying each score by its probability, and summing.

This might not be the most likely (mode) score if the distribution is not symmetrical, or if it is multi-modal.

You could run a simulation to get this plot (assuming the events are independent): Iterate 10000 times At each iteration, take a random number between 0 and 1 for each event. If the random number is less than or equal to Prob for that event then include the score, else don't include. Store the sum of included scores for that iteration. After all the iterations are done, plot the density of the summed scores.

Jonathan Moore
  • 251
  • 1
  • 7
  • 1
    You seem to be assuming independence. Since this isn't currently stated in the question you should probably make it explicit that you're assuming it. – Glen_b Oct 08 '19 at 11:18
  • 1
    Thank you very much! You say: "If the random number is greater than Prob for that event then include the score, else don't include." what is the justification for this part? My data is distributed in such a way that there is more large Prob events - will this end up favoring the low Prob ones because of the ' – Andrew Oct 09 '19 at 13:31
  • Apologies should have been > rather than – Jonathan Moore Oct 09 '19 at 17:43