Can data samples have constant means even if they are from different distributions?
Asked
Active
Viewed 74 times
1
-
Short answer: yes - the sample means can be the same. – Glen_b Mar 04 '14 at 14:52
2 Answers
1
Sampling usually implies a random element so you are not guaranteed equal means of the samples (i assume this is what you mean by "constant"). But randomness also means that some samples drawn from different distributions will have equal means.
To be more precise:
- Continuous vs. discrete: Samples from continuous distributions have infinitesimal probability of having equal means while the chances are better from discrete distributions.
- Probability of outcome: For distributions with discrete outcomes, a very high probability of one of the outcomes increases the chances that the samples will be identical.
- Sample size: Large samples sizes decrease probability of EQUAL means for equal discrete distributions and for all different distributions. They increase the probability of getting SIMILAR means (i.e. not exactly equal but close) for all equal distributions.
E.g. in R:
# These two different (discrete) Bernoulli samples are very likely to have equal means, i.e. 1
rbinom(n=5, size=1, prob=0.999) # 5 samples with 99,9% chance of being 1
rbinom(n=5, size=1, prob=0.998) # 5 samples with 99,8% chance of being 1
# These two are very likely to be different (big sample size) even though they're from the same distribution
rbinom(n=1000, size=1, prob=0.5) # 1000 samples with 50% probability of being 1
rbinom(n=1000, size=1, prob=0.5) # 1000 samples with 50% probability of being 1
#These two are almost guaranteed to be different, because they are continuous, even though the sample size is small
rnorm(2, mean=0, sd=1) # two samples from the standard normal distribution
rnorm(2, mean=0, sd=1) # two samples from the standard normal distribution

Jonas Lindeløv
- 1,778
- 1
- 17
- 28
-
Larger sample sizes will decrease the probability of similar sample means if the population means aren't the same. – Glen_b Mar 04 '14 at 14:54
0
No two samples will have exactly the same mean, even if they are from the same distribution. But, if you mean approximately the same, certainly, why not?
e.g
set.seed(123) #Set the seed
xnorm <- rnorm(100, 0, 1) #Normally distributed
xunif <- runif(100, -1,1) #Uniformly distributed
mean(xnorm)
mean(xunif)
These aren't exactly the same, because of sampling error; you can make them closer by increasing sample size.

Peter Flom
- 94,055
- 35
- 143
- 276
-
4
-
-
Do you know of any tests for equality of means that are robust against non-normality and different distributions? – user40124 Mar 04 '14 at 19:03
-
user40124 - there was a good answer to exactly that question given here on CV just in the last day, which said just about everything I'd have said. I'll see if I can find it... [yep, here it is](http://stats.stackexchange.com/a/88647/805). There are other answers with similar information to be found. ...(ctd) – Glen_b Mar 04 '14 at 21:41
-
(ctd)... I'd add a couple of things to that: (i) where the data take small integer values, the impact of ties can be substantial; any rank based tests need to account for the non-continuous distributions (one can still compute the permutation distribution); they also don't tend to be so useful for testing location differences if the shapes and spreads couldn't be rendered similar by monotonic transformation; similar remarks apply to permutation tests; (ii) one approach to the bootstrap with different distributions (if your samples are large enough to support it!) is to resample within groups. – Glen_b Mar 04 '14 at 21:41