Given data $<y_i, X_i>$ for $i \in \{1, 2, 3, \dots n\}$ ($n$ samples), and we are interested in knowning the relationship between $y$ and $X$. In the simplest manner, we can solve for $\beta$ where: $$ y = X\beta + \epsilon $$ Soving $\beta$ by minimizing some loss (e.g. $\ell_2$ loss) will give us an overall understanding of $\beta$.
However, more than often, the data have some heterogeneity, and we also have several solutions to deal with them:
1) Mixture of Expert: $$ y = \sum_i^kp_kX\beta_k+\epsilon $$ where $k$ stands for the number of experts and $p$ is the probability, and estimating $p_i$ and $\beta_i$ is the challenge.
2) Mixed Model: $$ y = X\beta + \epsilon_i + \epsilon $$ where $\epsilon_i$ stands for the residue error left due to heterogeneity (i.e. random effects). And the challenge is to estimating $\epsilon_i$
It seems to me that these two differs in the sense that whether you want to treat the heterogeneity as the fixed effect or random effect. I wonder if there are deeper connections other than hand-waving arguments.
I wonder if there are any works connecting and discussing these two methods, like some arguments about advantages v.s. disadvantages.
Edit: Thanks for @Machine epsilon pointing a related question here. But I hope my question could raise a deeper discussion of the relation of these two, instead of an introduction of the differences of these two, like the answers to the related question. Additionally, to raise a deeper discussion, to understand what random effect is will also be related in my opinion.