What is a mixture of RNNs?

Question

I am reading papers on different types of classification and prediction methods and keep coming across "Mixture of Recurrent Neural Networks" and "Mixture of Markov Chain Models".

Does this refer to more than one RNN (or MC) being used, or is this referring to the probability distributions?

What exactly does "mixture" mean here?

One paper that mentions both of these, is "Predicting shopping behaviour with Mixture of RNNs": https://difabbrizio.com/papers/sigir-ecom-2017-cs.pdf

Can you [edit] to include the citations of the articles that you’re reading? — Sycorax, Jan 07 '22 at 14:30

score 1 · Accepted Answer · answered Jan 10 '22 at 08:43

I agree with you that from the paper their methodology was not totally clear. To understand it better, you should check as well paper they refer to by Bertsimas et al (2003) that is more verbose about the model. It appears that what they mean by "mixture" is that they modeled probability of observing particular session $\boldsymbol{S}_i$ given the outcome of the session $C_i$, $P(\boldsymbol{S}_i | C_i)$ using separate RNNs. In such a case, the probability of observing a particular outcome $C_i$ given the session can be calculated using Bayes theorem

$$ P(C_i = \omega|\boldsymbol{S}_i) = \frac{P(\boldsymbol{S}_i | C_i)\,P(C_i = \omega) }{\sum_{\omega \in \Omega} P(\boldsymbol{S}_i | C_i)\,P(C_i = \omega) } $$

It is hard to say what exactly is $P(\boldsymbol{S}_i | C_i)$. They refer to Bertsimas et al (2003) who modeled it using Markov Chain. Here they use RNNs. Then they say

Taking inspiration from the Automatic Speech Recognition (ASR) community and similarities to “Language Modeling”, we adapted some of their more recent techniques to our problem. In preliminary experiments, 5-grams performed better than shorter chains, so we used them.

From this description we cannot know what they mean by "some of their more recent techniques". Maybe they used RNN with skip-grams, so given the history they predicted next event in the sequence, but this is only a guess.

They call it a "mixture" in a sense of mixture distribution (see mixture-distribution) where the distribution can be thought as a mixture of different distributions $P(\boldsymbol{S}_i | C_i)$ with some mixing weights $P(C_i = \omega)$. Mixtures are used in clustering, but also there are generalizations of the idea to mixtures of regression models or mixtures of neural networks.

TL;DR they have separate models for different outcomes $C_i$ and then weight the models using the overall frequencies of the outcomes in the data $P(C_i = \omega)$, but from the description is not very clear how exactly they are doing this, because they didn't give the details.

What is a mixture of RNNs?

1 Answers1