0

We have two groups: N1 = 10 and N2 = 100

Their means on some measurement are: Mean1 = 4 and Mean2 = 5

Their variances are Var1 = 3 and Var2 = 2.5.

Let's further assume we have no access to the individual level data.

Some guy wants to combine the two groups' means into one. However, instead of taking a weighted mean of two groups, as would have done anyone reasonable, he calculates a simple unweighted mean: UnweighedMean = (4 + 5)/2 = 4.5

(Please don't tell me it's very wrong. I know it is.) So, if we stick to this incorrect method, what would you say is the sample size for that total mean of 4.5? Is it 20? And what is the variance of that mean?

Thank you for your thoughts!

dl7631
  • 3
  • 2

1 Answers1

0

There is no such thing as a "sample size" for a statistic, such as, in your case, the unweighted mean. Rather you can discuss the standard error. If the distribution in the two populations is normal, then this mean would have a T-distribution, but calculating the effective degrees of freedom (or an approximation thereof) analytically is probably too challenging to get a sound answer. Even for Fisher Behren's problem, the Satterthwaite degrees of freedom for the correctly calculated "pooled" (null) mean is just an approximation). A simply variance rule for independent samples could lead us to:

$$\bar{X}_{pooled} = \frac{1}{2}(\bar{X}_1 + \bar{X}_2)$$ has

$$\text{var}(\bar{X}_{pooled}) \approx \frac{1}{4}\left(\hat{\sigma^2_1} + \hat{\sigma}^2_2 \right)$$

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • Thank you! I realize, the bootstrap would probably be the way to go. But I was interested in people's opinion about what to do when there is no access to the raw data. – dl7631 Feb 10 '21 at 16:19