Using Bayes for combining forecasts with different accuracies (Interview question)

Question

I have 3 independent sources for tomorrow's weather forecast:

100% probability for snow, this source is 80% accurate

50% probability for snow, this source is 60% accurate

0% probability for snow, this source is 40% accurate

The accuracy for each source is: $$\frac{\text{Number of correct forecasts}}{\text{Total number of forecasts}}$$ What is the best estimate for the probability for snow tomorrow?

Essentially it is an extension of a previous question, but where each source has a different accuracy or reliability.

Also, the selected answer suggested using the geometric mean, ignoring the fact that one of the probabilities is 0, collapsing the entire answer to 0 which intuitively makes no sense: the forecast with the lowest accuracy should not supersede a more accurate forecast just because of numerical considerations.

My intuition was to solve it weighing probabilities with accuracies: $$\frac{1.0 \times 0.8 + 0.5 \times 0.6 + 0 \times 0.4} {0.8 + 0.6 + 0.4}$$ but the interviewer insisted on solving using Bayes theorem.

In addition, if one of my sources has accuracy 100%, then it makes no sense to calculate a weighed mean with the other sources.

I could weigh as in AdaBoost:
$$\alpha_m = \frac{1}{2}\ln\left( \frac{1 - \epsilon_m}{\epsilon_m}\right)$$ so e.g. for the source with accuracy 80%, the weight would be
$$\alpha_m = \frac{1}{2}\ln\left( \frac{0.8}{0.2}\right) = \frac{1}{2}\ln(4)$$ etc. Is this an acceptable solution for this question?
In any case I'd be very happy to see how it can be solved using Bayes.

I have seen a few other questions similar to it, but none exactly the same.

The problem is interesting, but the wording is a bit confusing in my opinion. What does it mean that source Y is $x%$ accurate? It cannot mean that the probability of snowing $P$ is a discrete random variable, which is equal to 0 with probability 0.4, to 0.5 with probability 0.6 and to 1 with probability 0.8, because the sum of the probability masses would be > 1. So another model must be considered, but the wording makes it harder to understand which one... — DeltaIV, Apr 16 '17 at 14:51
@DeltalV These are independent sources, so for each one, accuracy=(number correct forecasts)/(total number of forecasts). I'm editing the question to make it clearer. — carmi, Apr 16 '17 at 14:59
But accuracy is about classification and here we have regression, so some additional loss function must be specified, but then it's hard to call it accuracy? Or maybe we have to consider only 3 classes, that is 0%, 50% and 100% probability of snow? — Łukasz Grad, Apr 16 '17 at 15:13
The event may be binary, but since the forecasts are *probabilistic*, typically "accuracy" would be more complicated than "# correct/# forecasts" (see [here](https://en.wikipedia.org/wiki/Forecast_skill)). Also, the suggested answer was geometric mean of probabilities or odds-ratios? (i.e. arithmetic mean of logits) — GeoMatt22, Apr 16 '17 at 15:19
@Grad The way I understood it during the interview was, accuracy for predicting snow. So I guess, if the source is 80% accurate, this means that in 80% of its forecasts it correctly predicted snow or no snow. — carmi, Apr 16 '17 at 20:08
This question makes no (Bayesian) sense to me unless 'accuracy' is better defined. — Memming, Apr 16 '17 at 20:19
cami: if it snows tomorrow, is source 2's forecast correct or incorrect? (i.e. your accuracy definition only makes sense if the forecaster's give only yes/no predictions) From the linked Tim answer (and, e.g. the Baron/Tetlock paper cited there), you could do a weighted geometric avg of the odds, with weights some function of "accuracy", and truncating e.g. [100,50,0] probabilities to [99,50,1]. But the answer is not unique. (For equal accuracies, and no 0 or 100 Pr's, this *would* produce the Bayesian answer) — GeoMatt22, Apr 16 '17 at 20:25
I agree 100% with @GeoMatt22: my point was precisely that your definition of accuracy as (number correct forecasts)/(total number of forecasts) is meaningless for a probabilistic forecast. When is forecast 2 (50% probability of snow) a correct forecast? When it snows? When it doesn't snow? In both cases? The question seems ambiguous and/or poorly worded, which unfortunately isn't at all uncommon when probability questions are asked during an interview for a non-academic position. — DeltaIV, Apr 16 '17 at 20:57

DeltaIV · Accepted Answer · 2017-04-18T07:45:55.397

With probabilistic forecasts, accuracy cannot be simply defined as (number correct forecasts)/(total number of forecasts), or better, you must define what you mean by a "correct forecast". For example you can set a threshold: if your forecast gives P("snow")>0.5, you predict "snow", otherwise "no snow". With this threshold, then if it snows tomorrow, source 1's forecast is correct, and the other two are wrong. At this point, you can use the same procedure as in question

https://stats.stackexchange.com/a/34141/58675

to compute a probability of snow tomorrow, conditional to the fact that source 1 predicts "snow", and source 2 and 3 don't ($P(y=1|\hat{y}_1=1,\hat{y}_2=0, \hat{y}_3=0)$. However, with respect to the original question, we have already introduced an arbitrary threshold. Not only that: as you can see in the linked answer, to get $P(y=1|\hat{y}_1=1,\hat{y}_2=0, \hat{y}_3=0)$ we need also the prior probabilities $P(y=1), P(y=0)$. Assignign these quantities would be arbitrary too. Thus, I don't believe your question has a unique answer, if we have to rely on Bayes' theorem.

I wish I had thought about this during the interview! This just goes to show how deep understanding of statistics is lacking in CS graduates, even with a PhD... — carmi, Apr 18 '17 at 07:15

Taylor · Answer 2 · 2017-04-20T20:14:22.490

If you were using Bayesian Model Averaging, you would average your predictions $p(y_{t+1}|m_k,y_{1:t})$ with weights that represented some posterior model probabilities $p(m_k|y_{1:t})$ like this $$ p(y_{t+1}|y_{1:t}) = \sum_k p(y_{t+1}|m_k,y_{1:t})p(m_k|y_{1:t}). $$

One way I could understand your question is to find under what circumstances can we understand your "accuracy" as a posterior model probability. If you were updating your model's posterior probabilities at every time step using this recursive formula $$ p(m_k|y_{1:t}) = \frac{p(y_{t}|m_k)p(m_k|y_{1:t-1})}{\sum_{k'} p(y_{t}|m_{k'})p(m_{k'}|y_{1:t-1}) }, $$ then the posterior would increase after correct predictions, and decrease with bad predictions. If you started off with uniform priors over all of your models, and if you only had two classes for the categorical observations, and if you predict $Y_t = 1$ when $P(Y_t =1|m_k) > .50$, and if your data was iid, then your "accuracy" could represent something close to the posterior model probabilities. It would probably get pretty close after a few steps.

I think your answer is wrong. The OP's accuracies can't represent anything close to posterior model probabilities (except if for "close" you mean they're both real numbers in [0,1]!). Posterior model probabilities sum to 1 (over the space of all possible models): OP's accuracies sum to 1.8(!). — DeltaIV, Apr 17 '17 at 07:49
@DeltaIV then we can just normalize like OP did In his/her answer — Taylor, Apr 17 '17 at 15:21

Using Bayes for combining forecasts with different accuracies (Interview question)

2 Answers2