Do we calculate the Standard Deviation of a population the same no matter how small (say less than 30) population size gets? Does distribution type play any roles in this? Thank you in advance
Asked
Active
Viewed 2,569 times
1
-
2If you would read any of our higher-voted threads on standard deviation, such as http://stats.stackexchange.com/questions/118/why-square-the-difference-instead-of-taking-the-absolute-value-in-standard-devia, it should quickly become clear that the definition and calculation of the SD have nothing to do with any assumptions about the population. – whuber Feb 26 '15 at 17:20
-
Thank you whuber for the link. I did research my question before posting it but I couldn't find a specific answer to it. Thanks again. – user37301 Feb 26 '15 at 19:36
-
If you could not find a specific answer, then you must be meaning to ask something different than what the question actually states: there is one, and only one, definition of a standard deviation for a population, period. What, then, do you mean by "population" and "standard deviation"? – whuber Feb 26 '15 at 19:51
-
Are you talking about sampling without replacement from a small population? It makes no difference when talking about the standard deviation of the distribution, but it may matter if you're talking about the standard deviation for the sample. – Glen_b Feb 26 '15 at 20:45
-
1@whuber- You are correct, I could be using the wrong terminology here. By population I mean all the members of the group of interest. For example, a class of 25 students has a "population size" of 25 (N=25). That is what I understand population means in statistics. So if I want to calculate the Standard Deviation of all the 25 students scores in an exam, I use sigma^1/2=((X-mean)^2/N)^1/2 Please let me know if this is not correct. Thank you! – user37301 Feb 26 '15 at 23:12
1 Answers
-1
While it's true that the standard deviation of the sample is defined the same way regardless of the distribution type, the convergence of the sample s.d. to the true process s.d. can be slower or faster depending on the distribution type. In most real world settings, you'd also be concerned with robustness, i.e. preventing a single or few corrupted points (e.g. probe malfunction) to skew the s.d. estimation. if you are expecting a certain distribution type, you can try to fit the empirical cdf to that distribution's cdf, and kolmogorv- smirnoff test may be used to hypothesis-test a specific distribution.

KishKash
- 154
- 5
-
1Could you make an explicit connection between these ideas and the actual question, which is about computing the SD of a *population*? – whuber Feb 26 '15 at 19:40
-
@ KishKash: Thank you for your explanation. Logically the weight of an outlier is more in a small population as it would in a small sample. At the same time intuitively I feet that n-1 is not the correct answer either. I also know that I don't have a normal distribution. Now I am stuck : – user37301 Feb 26 '15 at 19:48
-
See for example [this](http://stats.stackexchange.com/questions/11707/why-is-sample-standard-deviation-a-biased-estimator-of-sigma/27984#27984) question. – KishKash Feb 27 '15 at 20:23
-
The point for this question is that if you think the distribution is not Gaussian, then diving into the literature for the specific distribution you suspect may yield a better approximation formula than the well-known population mean estimator with n-1 in the denominator – KishKash Feb 27 '15 at 20:31