0

I need your help in order to clarify a doubt...I have two samples:

sample $A$ with mean $\mu_A$ and standard deviation $\sigma_A$

sample $B$ with mean $\mu_B$ and standard deviation $\sigma_B$

I know the observed values from the sample $B$ but I don't know anything on sample $A$, is there a way to find mean and standard deviation of the combined sample $A \cup B$ ?

We can assume that the two samples are i.i.d within and between them, so I think I can compute:

$\mu_{A \cup B}$ = $\mu_A + \mu_B$

$\sigma_{A \cup B} = \sqrt{\sigma_A + \sigma_B}$

am I right? Thank you!!!

dsaxton
  • 11,397
  • 1
  • 23
  • 45
Bmb58
  • 3
  • 1
  • Both formulas are incorrect. Try them out on a simple case, say where both samples have just two values each. Regardless, how could you possibly apply them if you "don't know anything" about sample $A$? – whuber Aug 20 '18 at 13:27
  • If you don't know anything about A, you haven't really sampled it, have you? – Nuclear Hoagie Aug 20 '18 at 15:06

1 Answers1

0

Indeed, both formulas are false.

For the mean it all depends on the sample size of A and B. Let's say that Z is the sample that combines A and B, then

$\mu_Z=\frac{n_A\times\mu_A+n_B\times\mu_B}{n_A+n_B}$

With $n_A$ and $n_B$ the sample sizes of A and B respectively.

It becomes harder with the variance.

If you use the biased estimate of the variance: $\sigma^2=(n^{-1}\sum_{i=1}^nx_i^2)-\mu^2$, then you can write:

$\sigma^2_Z=\frac{n_A\times(\sigma^2_A+\mu_A^2)+n_B\times(\sigma^2_B+\mu_B^2)}{n_A+n_B}-(\frac{n_A\times\mu_A+n_B\times\mu_B}{n_A+n_B})^2$

If you use the unbiased estimate of the variance: $\sigma^2=((n-1)^{-1}\sum_{i=1}^nx_i^2)-\frac{n}{n-1}\mu^2$, then you can write:

$\sigma^2_Z=\frac{(n_A-1)\times(\sigma^2_A+\frac{n_A}{n_A-1}\mu_A^2)+(n_B-1)\times(\sigma^2_B+\frac{n_B}{n_B-1}\mu_B^2)}{n_A+n_B-1}-\frac{n_A+n_B}{n_A+n_B-1}(\frac{n_A\times\mu_A+n_B\times\mu_B}{n_A+n_B})^2$

Yoda
  • 26
  • 4
  • 1
    -1 Both formulas are incorrect. – whuber Aug 20 '18 at 13:25
  • 1
    The sample mean is the average and not the sum of the two sample means. The results for population parameters refers to the sum of two random variables, X and Y. Then the Z=X+Y has E(Z)=E(X)+E(Y) and assuming X and Y are independent Var(Z)=Var(X)+Var(Y) because Cov(X,Y)=0. So the standard deviation for Z is the square root of the sum of the two variances and not the square root of the sum of the two standard deviations. – Michael R. Chernick Aug 20 '18 at 13:34
  • 1
    I edited the post, I clearly misread the post and made a great mistake, sorry about that. – Yoda Aug 20 '18 at 15:14