1

http://www.stt.msu.edu/users/makagon/2.5-7.pdf

For any data set, the proportion (or percentage) of values that fall within $k$ standard deviations from mean (that is, in the interval $\bar{x} - ks, \bar{x}+ks$) is at least $1-\frac{1}{k^2}$ , where $k > 1$.

In particular, $89\%$ of measurements fall within range of $3$ standard deviations from the mean, regardless of the distribution.

Suppose I collect 100 samples from some unknown distribution. I do not know the population mean or population standard deviation.

Does it mean I can calculate sample standard deviation and sample mean and conclude that 89% of my data points are within $3s$ from the sample mean?

user4205580
  • 471
  • 1
  • 5
  • 13
  • @JohnK What law tells me that by taking more samples, the sample standard deviation will converge to population s.d.? Is it because the difference between true s.d $\sigma$ and sample s.d is approximately equal to $\frac{\sigma}{n}$? As explained [here](http://stats.stackexchange.com/questions/11707/why-is-sample-standard-deviation-a-biased-estimator-of-sigma) And it means taking more samples = increasing $n$ = closer to true s.d.? – user4205580 Jan 25 '16 at 14:40
  • @JohnK You might have been hasty: the question concerns only the sample, not the population, so your comment does not even seem to apply. If by "sample standard deviation" we understand the SD of the dataset (and not some estimator thereof), then Chebyshev's inequality applies because the sample SD and sample mean are the SD and mean of the empirical distribution. – whuber Jan 25 '16 at 15:58
  • @whuber In the question, however, a distinction is made between sample and population values and the OP admits that he does not know the population values. I read it as whether it is ok to plug in estimates in lieu of the unknown population quantities. I think some clarification is needed. – JohnK Jan 25 '16 at 16:10
  • @JohnK Although it's fine to ask what is intended, the question *as written* is perfectly clear: it asks as explicitly as possible about the *data points*, the *sample* SD, and the *sample* mean. Within this setting your first comment appears both misleading and wrong because it implicitly answers a question that was not asked. – whuber Jan 25 '16 at 16:16
  • @whuber I must have misread it then, I apologise for any confusion. – JohnK Jan 25 '16 at 16:19
  • @whuber If I constructed an empirical distribution from my samples, I'd have to assume that all outcomes are equally likely? This is because the formula for standard deviation is $\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2}$ - it's valid on the assumption that all outcomes are equiprobable. – user4205580 Jan 25 '16 at 16:50
  • The empirical distribution of the sample describes a situation in which all the sample values are written on balls, those balls are thrown into an urn, and one is drawn out at random. *Of course* each ball is equally likely. The point is that Chebyshev's Inequality applies to this urn and its conclusion is precisely what you were asking about. – whuber Jan 25 '16 at 17:37

0 Answers0