As an exercise I decided to check if unbiased estimator of standard deviation of sample is giving better results than biased estimator
. So far it looks that only in aprox 55% cases. Am I doing something wrong or this is normal?
My methodology:
- generate sample of 100 numbers from range 1 to 1000
- 10 000 times choose 10 numbers from above
- for each 10 numbers calculate biased and unbiased estimator of standard deviation
- check how many times unbiased estimator was closer to standard deviation in population (in comparison to biased estimator).
My code:
import numpy as np
import random
from math import sqrt
# Lets generate set of 100 random numbers
myarray = np.random.randint(1,1000,100)
myarray_sd = np.std(myarray)
right = 0
wrong= 0
for k in range(10000):
# lets choose sample of numbers
sample= random.sample(set(myarray),10)
# calculate mean
sample_mean=np.mean(sample)
# calculate sd for n and n-1
sample_sd_n = sqrt(sum((sample-sample_mean)**2)/len(s))
sample_sd_n1 = sqrt(sum((sample-sample_mean)**2)/(len(s)-1))
# callculate diffrences between both sd and sd in population
res_n= abs(sample_sd_n - myarray_sd)
res_n1=abs(sample_sd_n1 - myarray_sd)
# check if std calculated using n-1 is more accurate
if res_n1<res_n:
right +=1
else:
wrong +=1
print ('The theory is correct in: %f cases' % (round(right/(right+wrong),2)))