2

I am interested in statistical inference using distributed methods, which basically means $m$ machines receiving a subset of the data $X_m$. We have some unknown distribution $\mu_\theta$ in a parametric family $P$, and we are trying to estimate some parameter $\theta$, which is output by one of the machines by aggregating estimators received from each machine. (So probably, just for now, distributed approaches to parametric regression.)

I have two questions:

  1. Is there a survey paper of different approaches to distributed statistical inference? The reason I ask this is because the papers I have come across seem to come from different literatures, some in compressed sensing, some in statistics, some in information theory and it is hard for me to parse what one approach does compared to another.

  2. It seems that the dominant approach in the papers take an approach of averaging the individual machine's estimators rather than coming up with distributed versions of the statistical distributions, although I could be wrong.

Are there approaches based on coming up with "distributed versions" of regular statistical distributions? Does it make sense to consider the problem like a mixture model, where the mixtures are over the different machines? Does it make sense to have a decomposition of a statistical distribution to model the problem?

Alexis
  • 26,219
  • 5
  • 78
  • 131
twnly
  • 121
  • 3
  • In general even consistent estimators are biased, and estimators are in general not linear. If you *average* estimators from subsamples of size $k=n/m$ then the bias will be the bias associated with a sample size of $k$ rather than a sample size of $m$. Similarly the variance of an average of $m$ efficient estimators on samples of size $k$ will generally be larger than the variance of an efficient estimator on a sample of size $mk$ (though this will often tend to be less important at very large sample sizes). In general, averaging is a less than ideal way of combining estimators. – Glen_b Dec 25 '18 at 08:59

0 Answers0