why z score so big for central limit theorem

Question

I have a strange distribution in my population. I created this distribution for the purpose of my question, but let's pretend we do know much about it.

Anyway, there are 6 random variables from 0 to 5. 5 has a frequency of 25%.

But now let's get to the problem.

I wanted to calculate the z score for X>4.

What I did:

I took a sample size of 200 from the population. I calculated the mean, which was: 2.61 I calculated std from the sample, which was 1.79

and I went calculating z score using central limit theorem formula:

zscore = (4-2.61)/(1.79/square root of 200)
zscore = 10.92

I am surprised by such a big z score. How I can interpret this? As far as I understand it, it tells me value 5 is 10.92 standard deviations away, it has practicaly no proability to happen according to central limit theorem, but if we look at original population it happened in aprox. 25%.

The distribution of the mean is much more concentrated than the one of a single observation. — Michael M, Sep 29 '19 at 16:57
Use your intuition: just what fraction of all samples of size 200 from this population will have means $5$ or larger? (It's actually easy to compute an exact answer if you like.) — whuber, Sep 29 '19 at 16:57
@whuber Zscore of centreal limit theorem calculates mean value? — Stenga, Sep 29 '19 at 17:04
I think your statement: `I wanted to calculate the z score for X>4.` may be the core of the problem. Your Z-score would use the CLT to approximate $P(\bar X_{200} > 4), $ which is very small indeed. (About $4.6 \times 10^{-28}.)$ By contrast, for any one observation $X_i,$ one has $P(X_i > 4) = 0.25.$ — BruceET, Sep 29 '19 at 20:39

score 2 · Answer 1 · answered Sep 29 '19 at 20:54

Comment continued: Here is a simulation of sample means $A = \bar X_{200}$ from 100,000 samples of size 200 from your population.

set.seed(929)
a = replicate( 10^5, 
  mean(sample(c(1,2,3,5), 200, rep=T, p=c(.2,.05,.5,.25))) ) 
mean(a > 4)
[1] 0
hist(a, prob=T, br=20, xlim=c(0,5), col="skyblue2")

There was not even one instance, among the 100,000 samples, of a sample mean exceeding $4.$ Here is a histogram of the simulated sample means.

why z score so big for central limit theorem

1 Answers1