I am interested in statistical inference using distributed methods, which basically means $m$ machines receiving a subset of the data $X_m$. We have some unknown distribution $\mu_\theta$ in a parametric family $P$, and we are trying to estimate some parameter $\theta$, which is output by one of the machines by aggregating estimators received from each machine. (So probably, just for now, distributed approaches to parametric regression.)
I have two questions:
Is there a survey paper of different approaches to distributed statistical inference? The reason I ask this is because the papers I have come across seem to come from different literatures, some in compressed sensing, some in statistics, some in information theory and it is hard for me to parse what one approach does compared to another.
It seems that the dominant approach in the papers take an approach of averaging the individual machine's estimators rather than coming up with distributed versions of the statistical distributions, although I could be wrong.
Are there approaches based on coming up with "distributed versions" of regular statistical distributions? Does it make sense to consider the problem like a mixture model, where the mixtures are over the different machines? Does it make sense to have a decomposition of a statistical distribution to model the problem?