I have k models M1, M2, .... , Mk. Each of these models output a probability distribution over L classes C1, C2, .... , CL. I also have weights for each model w1, w2, ... , wk such that sum(w1, w2, ... , wk) = 1. In other words, the weights represent how much "confidence" I have in the output of the models. The objective is to combine the probability distributions from the k models into one single probability distribution. This can easily be accomplished by multiplying the individual probability distributions with the model weights. In other words
$$ Combined Probability Distribution = \sum_{i = 1}^{k}w_im_i $$
where $ m_i $ is the probability distribution for model $ M_i $ over classes $ C_j $.
For example, let's say we have three models M1, M2, and M3, with the following weights and probability distributions over four classes
Model | Weight | Probability Distribution |
---|---|---|
M1 | 0.7 | 0.90, 0.05, 0.05, 0.00 |
M2 | 0.2 | 0.80, 0.10, 0.05, 0.05 |
M3 | 0.1 | 0.70, 0.15, 0.10, 0.05 |
This gives an overall probability distribution of 0.7*[0.90, 0.05, 0.05, 0.00] + 0.2*[0.80, 0.10, 0.05, 0.05] + 0.1*[0.70, 0.15, 0.10, 0.05] = [0.86, 0.07, 0.055, 0.015]
Question: What do I do when one of the models, say M1 in the example above, is missing this probability distribution for some samples? In other words, I only have probability distributions from M2 and M3 for some samples, and all probability distributions for other samples?
Simply combining the distributions from M2 and M3 and renormalizing it is not correct, since the weights for M2 and M3 are very low. One option is to assume that all classes get the same weight for M1 and then do the same computation, like so
0.7*[0.25, 0.25, 0.25, 0.25] + 0.2*[0.80, 0.10, 0.05, 0.05] + 0.1*[0.70, 0.15, 0.10, 0.05] = [0.405, 0.21, 0.195, 0.19]
Is this a good way, statistically speaking? Are there better ways to accomplish this?