recompute mean and standard deviation adding observed data to my sample

Question

I need your help in order to clarify a doubt...I have two samples:

sample $A$ with mean $\mu_A$ and standard deviation $\sigma_A$

sample $B$ with mean $\mu_B$ and standard deviation $\sigma_B$

I know the observed values from the sample $B$ but I don't know anything on sample $A$, is there a way to find mean and standard deviation of the combined sample $A \cup B$ ?

We can assume that the two samples are i.i.d within and between them, so I think I can compute:

$\mu_{A \cup B}$ = $\mu_A + \mu_B$

$\sigma_{A \cup B} = \sqrt{\sigma_A + \sigma_B}$

am I right? Thank you!!!

Both formulas are incorrect. Try them out on a simple case, say where both samples have just two values each. Regardless, how could you possibly apply them if you "don't know anything" about sample $A$? — whuber, Aug 20 '18 at 13:27
If you don't know anything about A, you haven't really sampled it, have you? — Nuclear Hoagie, Aug 20 '18 at 15:06

Yoda · Accepted Answer · 2018-08-21T12:47:10.790

0

Indeed, both formulas are false.

For the mean it all depends on the sample size of A and B. Let's say that Z is the sample that combines A and B, then

$\mu_Z=\frac{n_A\times\mu_A+n_B\times\mu_B}{n_A+n_B}$

With $n_A$ and $n_B$ the sample sizes of A and B respectively.

It becomes harder with the variance.

If you use the biased estimate of the variance: $\sigma^2=(n^{-1}\sum_{i=1}^nx_i^2)-\mu^2$, then you can write:

$\sigma^2_Z=\frac{n_A\times(\sigma^2_A+\mu_A^2)+n_B\times(\sigma^2_B+\mu_B^2)}{n_A+n_B}-(\frac{n_A\times\mu_A+n_B\times\mu_B}{n_A+n_B})^2$

If you use the unbiased estimate of the variance: $\sigma^2=((n-1)^{-1}\sum_{i=1}^nx_i^2)-\frac{n}{n-1}\mu^2$, then you can write:

$\sigma^2_Z=\frac{(n_A-1)\times(\sigma^2_A+\frac{n_A}{n_A-1}\mu_A^2)+(n_B-1)\times(\sigma^2_B+\frac{n_B}{n_B-1}\mu_B^2)}{n_A+n_B-1}-\frac{n_A+n_B}{n_A+n_B-1}(\frac{n_A\times\mu_A+n_B\times\mu_B}{n_A+n_B})^2$

edited Aug 21 '18 at 12:47

answered Aug 20 '18 at 12:58

Yoda

26
4

1

-1 Both formulas are incorrect. – whuber Aug 20 '18 at 13:25
1

The sample mean is the average and not the sum of the two sample means. The results for population parameters refers to the sum of two random variables, X and Y. Then the Z=X+Y has E(Z)=E(X)+E(Y) and assuming X and Y are independent Var(Z)=Var(X)+Var(Y) because Cov(X,Y)=0. So the standard deviation for Z is the square root of the sum of the two variances and not the square root of the sum of the two standard deviations. – Michael R. Chernick Aug 20 '18 at 13:34
1

I edited the post, I clearly misread the post and made a great mistake, sorry about that. – Yoda Aug 20 '18 at 15:14

recompute mean and standard deviation adding observed data to my sample

1 Answers1