Learning Statistics

Question

I am new to statistics and I am trying to get my head around 'why' some of these are calculated the way they are and what they mean, or were they just made a preference that stuck? - the statistics books I'm using don't seem to frame things and provide context:

Variance: why square the values

Standard Deviation: why 68%

Geometric mean: why is it useful

I'm voting to close this question as off-topic because 2 of 3 questiosn are answered elsewhere Re: geometric mean see http://stats.stackexchange.com/questions/23117/which-mean-to-use-and-when Re: variance see: http://stats.stackexchange.com/questions/118/why-square-the-difference-instead-of-taking-the-absolute-value-in-standard-devia and the remaining one is unclear. `Standard Deviation: why 68%` Could you clarify what this means? 68% of what? How does 68% relate to standard deviation? — Sycorax, Apr 20 '16 at 21:04
It's pretty obvious that 68% refers to the often stated and used fact that the probability of being within (+ or -) 1 standard deviation of the mean of a Normal random variable is ~68%. But yes, not the clearest wording in the question. — Mark L. Stone, Apr 20 '16 at 21:10
@MarkL.Stone I was hoping that by writing out an explanation of the question, the asker would improve their understanding of the topic and perhaps answer their own question. — Sycorax, Apr 20 '16 at 21:14
@C11H17N2O2SNa , yes,Ii thought that might be the case, but in any event, I essentially answered that item, which was the only one not addressed by the links you provided. — Mark L. Stone, Apr 20 '16 at 21:17

P A N · Answer 1 · 2016-04-20T21:49:03.213

I am also new to statistics, but I think I can answer your questions in a basic way.

Variance

Suppose you have these two datasets: $\{10, 20, 30, 40\}$ and $\{24, 25, 26\}$. Both have the arithmetic mean $ \mu = 25 $ but their variance is very different.

You want to know the average variance around the mean. To do that, you take the deviations from the mean and divide them by the number of items in the data set:

$$ \frac{\sum(x_{i}-\bar{x})}{N} $$

So for the latter example, it would be $ \frac{(24-25)+(26-25)}{3}$. However the answer here would be $ = 0 $ because the positive and the negative deviations neutrilize each other.

So that's why we introduce variance ($ \sigma^{2} $):

$$ \sigma^{2} = \frac{\sum(x_{i}-\bar{x})^{2}}{N} $$

By squaring the deviations, we are removing the neutrilizing effect of averaging the deviations.

The standard deviation is simply the square root of the variance: $ \sigma = \sqrt{\sigma^{2}} $

Another way of doing this would be to use the Mean Average Deviation (MAD) by converting the deviations to absolute values:

$$ MAD = \frac{\sum\mid{x_{i}-\bar{x} \ \mid}}{N} $$

Empirical rule

The 68 % you are referring to in regards to the standard deviation is part of what's known as the Empirical rule. Empirical studies have shown that when a distribution is normally distributed (bell-shaped curve), it is very probable that:

$ 68 \%$ of all values in the distribution are likely to be found within $ \pm1 \sigma $ (1 standard deviation) from the mean $ \mu $.
$ 95 \%$ of all values in the distribution are likely to be found within $ \pm2 \sigma $ (2 standard deviations) from the mean $ \mu $.
$ 99.7 \%$ of all values in the distribution are likely to be found within $ \pm3 \sigma $ (3 standard deviations) from the mean $ \mu $.

So if you know the standard deviation (the average variance) of a data set, you know that 95 % of all values can be found within the double range of the average variance's range.

If the mean is $\mu = 200$ and the standard deviation is $ \sigma = 40 $, then 95 % of the values can be found in the range 120 – 280:

$ -2\sigma < \mu < 2\sigma = -120 < 200 < 280 $

Geometric mean

Has been answered elsewhere as per the comment above, but one reason (not specific to statistics) would be because there has been a compounding of values, where the arithmetic mean would give the wrong impression. See this link relating to finance for instance.

Thank you. It helps me with both the specific, and in understanding better how to participate in this community - fair criticism; Cross Validated is a very well-focused community. Appreciate the time and effort in being able to infer what I was actually asking - — S. Mike Bruce KM4JLX, Apr 21 '16 at 11:39

Learning Statistics

1 Answers1

Variance

Empirical rule

Geometric mean