How do we combine probability distributions component-wise to make a more accurate probability distribution?

Question

This subject intrigues me. My application is in the field of sports prediction. In sports prediction the experts compete who is the best - or we make it look as if they compete. So we try to find from the individual forecasts a composite forecast statistically more accurate than the best of the company of the forecasters. If we try to do that then of course we need to know the individuals past pereformances and then we need a formula to "add" them. I frequently do this:

Q = P1^W1 * P2^W2 ...

and then normalize the Q values to 1 and then try to work out the W1, W2 etc values for best results. By best results I mean the sum of |Log(Q)| of the winners should be maximum.

Now it happens sometimes that the result is W1 = 1, W2 = 0. This means the forecaster labelled "2" is useless. How can this happen ? In two ways: One way is that "2" always copies "1" (since we don't know what these fellows are actually doing when they prepare their newspaper wrteups !). The other way is that while the "1" is doing some serious studying, the "2" is content to give us random predictions - such as the license number of the first car crossing the road ! But real life is somewhere in between. There is always some overlap (or "copying") and as for the useless ones, well we are likely to know them in advance.

Now what is the ideal case ? When are the forecasters predictions totally independent ? Never really. You are likely to fancy Brazil to win the next world cup football series for more or less the same reasons as I do. But there is this thought experiment that makes them totally independent: I am a sportcaster with a known score p (0 <= p <= 1). You are another one with known score q (0 <= q <= 1). What happens with the two of us ? The following thing happens: Every night when we go to sleep a fairy comes to our dreams. She knows what is going to happen in the event we are trying to predict (because she is a fairy) but she plays the following game: In my dream she takes out of her pocket a scientific calculator and presses the Rnd button. If the number comes out to be <= p she tells me the truth. If the number that comes out is > p she tells a lie. But I am her spokesman -can't do otherwise- and what she says I say to you in the morning. Then she goes on and appears in your dream and she is doing precisely the same thing, only the random number has to be <= q now.

Now can this concept -with the fairies- be used to derive a better formula than mine (the P1^W1 * P2^W2 ...) ? That's the question. I have seen some writeups but I 'm not convinced.

Can you solve this example for me (with the ideal case formula - the fairies concept): We have a four way event, such as a horse race. Predictor A says the probabilities are 0.6, 0.4, 0, 0. Predictor B says they are 0.3, 0.3, 0.2, 0.2. What is the ideal average ?

There is the following similar thread

Combining two probability scores

but I 'm not satisfied with the answers given

There's no "ideal average", you can check the linked thread for a review of different possibilities. — Tim, Jul 10 '19 at 20:06
I mean "ideal average" for the "ideal case" as described, while we know of course that the ideal case does not really exist. — user143678, Jul 11 '19 at 23:00
"Ideal" in what terms? What is the "ideal case"? I can't see how your question differs from the linked one, you are also asking about averaging probabilistic forecasts. — Tim, Jul 12 '19 at 04:43
Ideal is the witches. This formula is suggested: P = p1 . p2 / (p1 . p2 + (1-p1) . (1-p2) ) is it correct ? Then for m events - n sources how do I proceed ? Piecewise ? If this is correct then of course one has to distort the p1, p2 to make it realistic. Thus p1 becomes p1 ^ w1 etc. If w1 = 1 the source "1" of information is good, if w1 = 0 it is useless. You agree ? — user143678, Jul 13 '19 at 11:17
"Correct" in what sense? All the discussed formulas are correct in the sense that they enable you to get some kind of aggregate estimate. Obviously the formula you mentioned would not work for cases when there are 0s in the data, since they would zero-out everything. — Tim, Jul 13 '19 at 11:53
where it says zero it's obviously going to zero it again - unless the specific gravity of the source is zero. I mean use the above formula recursively for m sources-n events. In the non ideal case (no witches) the p1-p2 are raised to indices w1-w2 ... (let's call those specific gravities). In real life there are the following extremes a) one source is in reality a copy of the other source - hence specific gravity zero b) one source produces random noise - again specific gravity zero. Correctness means maximum of the shannon I function. — user143678, Jul 14 '19 at 13:13

How do we combine probability distributions component-wise to make a more accurate probability distribution?

0 Answers0