Let $a_1$, $a_2$, ...,$a_m$ be the samples of data, and let us further assume the only information we have about each sample is their count/no. of samples, mean, standard deviation and median.
The task I have set myself is to recover the true, or at least the best estimate possible, of the mean, median and standard deviation of the union of these samples $a_1 \cup a_2 \cup ... \cup a_m$, which I will call $A$.
Recover the Mean
Recovering the mean is straightforward as we can just use the number of samples to recover the mean for $A$.
$$\bar{x}_1 = \frac{1}{n_1} \sum_\limits{i=1}^{n_1} x_i, \quad \bar{x}_2 = \frac{1}{n_2} \sum_\limits{i=1}^{n_2} x_i \quad ... \quad$\bar{x}_m = \frac{1}{n_m} \sum_\limits{i=1}^{n_m} x_i$$
$$ \bar{x}_A = \frac{n_1\bar{x}_1 + n_2\bar{x}_2 + ... + n_m\bar{x}_m}{n_1 + n_2 + ... + n_m} $$
Recover the Standard Deviation
This seems like it should be possible.
The standard deviation of a particular sample is defined as: $$ s_i = \sqrt{\frac{\sum_\limits{k=1}^{n_i} (x_k - \bar{x}_i)^2}{n_i-1}}$$
It seems to me, we could do the following to attempt to recover the standard deviation of $A$. Essentially we could assume a symmetric deviation about the subsample mean for each data point, half below, half above, and calculate the new whole sample standard deviation using the difference between the whole sample mean $\bar{x}_A$ and each subsample mean $\bar{x}_i$.
For a particular sample, say $a_i$, let us assume one-half of the data points are below the sample mean, and one-half are above the sample mean.
Because we can recover the sample $A$ mean from the data, we can use this new calculate the difference between the sample $A$'s mean and the subsample mean. This can then be used to attempt a recovery of the sample $A$'s standard deviation.
Let $d_i$ be the difference of a particular subsample from the overall sample mean $\bar{x}_A$, and let $s_i$ be the subsample standard deviation. Then
$$ s_A = \sqrt{\frac{\frac{1}{2}n_1(d_1 + s_1)^2 + \frac{1}{2}n_1(d_1 - s_1)^2 \\+ \frac{1}{2}n_2(d_2 + s_2)^2 + \frac{1}{2}n_2(d_2 - s_2)^2 \\+ ... + \frac{1}{2}n_m(d_m + s_m)^2 + \frac{1}{2}n_m(d_m - s_m)^2}{n_1 + n_2 + ... + n_m - 1}} $$
Recover the Median
I see no straightforward way for this to be accurate. We do have an idea of the dispersion and the difference between the mean and the median for each sample so I do have glimmers of possibilities but I have not thought deeply or can see a very obvious path.
My Question for Cross Validated
Can anyone comment on these strategies, offer their expertise, or point me to some resources?