7

I generated, in R, one hundred thousand random samples of ten values from the normal distribution with mean zero and unit standard deviation, and registered each mean and standard deviation, in hope to understand better their distribution.

moy <- c()
std <- c()
N <- 100000
for(i in 1:N){
    print((i/N))
    sam <- rnorm(10)
    moy <- c(moy,mean(sam))
    std <- c(std,sd(moy))
}
hist(std, n=10000, xlim=c(0.312,0.319))

What I wasn't expecting is shown here on the histogram of standard deviation of samples, which shows clear grouping of samples' SD estimates at/around some values more than expected :

histogram of sample's SD

My question is then, is there any logical cause for such strange distribution of samples' SD ?

Actually I was expecting some kind of normal (or very close to normal) distribution. I don't see any reason for this strange distribution apart from, maybe, the random number generator of R not generating quite random numbers. But maybe there is some mathematical cause for what is observed here ?

Thanks in advance.

Rodolphe
  • 861
  • 5
  • 16
  • 3
    what you found is in fact standard error of the sample mean, that's why you see clusters around $1/\sqrt{10}\approx0.316$. – Francis Mar 19 '16 at 00:30
  • Related is this answer describing the sampling distribution of the sample variance: [Why is the sampling distribution of variance a chi-squared distribution?](http://stats.stackexchange.com/a/121676/3601) – Aaron left Stack Overflow Mar 19 '16 at 13:18

1 Answers1

8

You've got a bug; you're taking sd of moy rather than sam. I bet your code is also pretty slow; a more R-like method would be as follows.

N <- 100000
n <- 10
d <- matrix(rnorm(N*n), nrow=10)
m <- colMeans(d)
s <- apply(d, 2, sd)

hist(s, 10000)
  • Holy #!$ so that was that ! Thank you very much for pointing out that mistake so fast. And yeah I noticed it slowed down as time went on during the running. Didn't know why... That remembers me not to try things too late at night. And thanks for the code. – Rodolphe Mar 19 '16 at 00:25
  • Maybe this strange distribution reflects the bouncing back and forth of standard deviation of the mean as the number of sample increase... – Rodolphe Mar 19 '16 at 00:32
  • What the 'm' variable has been calculated for? There is no further use of 'm'. You wanted the difference d and m. – Maximilian Mar 24 '16 at 19:38
  • I think you're right. I calculated `m` to parallel the OP code. – Aaron left Stack Overflow Mar 24 '16 at 20:00