In a normal distribution, the 68-95-99.7 rule imparts standard deviation a lot of meaning, but what would standard deviation mean in a non-normal distribution (multimodal or skewed)? Would all data values still fall within 3 standard deviations? Do we have rules like the 68-95-99.7 one for non-normal distributions?
-
18Have a look at [Chebyshev's inequality](http://en.wikipedia.org/wiki/Chebyshev's_inequality). – COOLSerdash Jul 20 '14 at 08:08
-
@COOLSerdash great. This perfectly answers my question. – Zuhaib Ali Jul 20 '14 at 08:38
-
6@COOLSerdash's point is on-target here, but be aware that the standard statement of Chebyshev's inequality pertains to the true SD known a-priori, not an SD estimated from your sample. It may help to read this excellent CV thread: [Does a sample version of the one-sided Chebeshev inequality exist?](http://stats.stackexchange.com/q/82419/) – gung - Reinstate Monica Jul 20 '14 at 16:15
-
Also, you should probably not settle for Chebyshev right away--you can probably do a lot better, skewed or not. – Steve S Jul 20 '14 at 16:34
-
I'm not interested in specifics... just need to understand what purpose does the SD concept universally serve regardless of type of distribution... and with Chebyshev's inequality I can grasp much of the concept, I think. – Zuhaib Ali Jul 20 '14 at 16:38
-
1@gung so does the 68-95-99.7 rule! – Glen_b Jul 20 '14 at 19:52
-
A version of Chebyshev applies to all centered $L^p$ norms, not just the SD, and thereby does not distinguish the SD. The 68-95 part of the 68-95-99.7 rule applies with good accuracy to a surprising range of non-Normal (even skewed) distributions. – whuber Jan 14 '20 at 18:40
3 Answers
It's the square root of the second central moment, the variance. The moments are related to characteristic functions(CF), which are called characteristic for a reason that they define the probability distribution. So, if you know all moments, you know CF, hence you know the entire probability distribution.
Normal distribution's characteristic function is defined by just two moments: mean and the variance (or standard deviation). Therefore, for normal distribution the standard deviation is especially important, it's 50% of its definition in a way.
For other distributions the standard deviation is in some ways less important because they have other moments. However, for many distributions used in practice the first few moments are the largest, so they are the most important ones to know.
Now, intuitively, the mean tell you where the center of your distribution is, while the standard deviation tell you how close to this center your data is.
Since the standard deviation is in the units of the variable it's also used to scale other moments to obtain measures such as kurtosis. Kurtosis is a dimensionless metric which tells you how fat are the tails of your distribution compared to normal

- 55,939
- 5
- 90
- 176
-
1"Now, intuitively, the mean tell you where the center of your distribution is, while the standard deviation tell you how close to this center your data is." - Wouldn't this only apply if the distribution is Normal? Otherwise, more often than not, the median is a better measure of central tendency. – Dan Temkin Feb 06 '18 at 21:16
-
2@DanTemkin When using the median, the standard deviation loses its value to a degree. It's calculated off of the mean. With median then it makes a sense to talk about quantiles, which could be a way to go with skewed distributions. OP didn't focus on skewed distributions though. So, for any symmetrical distribution you have mean=median, it doesn't have top be normal. Thus it makes a sense to talk about mean when standard deviation is discussed. – Aksakal Feb 06 '18 at 21:19
-
1@DanTemkin [No, it is not](https://stats.stackexchange.com/q/96371/44269). What you just said effectively implies "Only use the median, unless the mean equals the median." The median and mean are simply different ways of defining *central tendency*, each with their own value. – Alexis Jan 14 '20 at 18:02
The standard deviation is one particular measure of the variation. There are several others, Mean Absolute Deviation is fairly popular. The standard deviation is by no means special. What makes it appear special is that the Gaussian distribution is special.
As Pointed out in comments Chebyshev's inequality is useful for getting a feeling. However there are a more.

- 510
- 4
- 18
-
2Re "by no means:" arguably, what makes the SD so special is its close relation with the variance which is (asymptotically, of course) the *unique* measure of spread that makes the Central Limit Theorem work. – whuber Jan 14 '20 at 18:37
-
But only because the Central Limit Theorem leads to a Gaussian. But sure, maybe my language was stronger than it needed to be. – Keith Jan 14 '20 at 19:14
-
1I think that confuses cause and effect, Keith. At https://stats.stackexchange.com/a/3904/919 I have attempted to explain the primacy of the SD without invoking the conclusion of the theorem. The point is that the mean and SD play a role in achieving a sequence of random variables that converges in distribution, regardless of what that limit might be: its Normality is a separate issue. – whuber Jan 14 '20 at 19:17
-
1Sure. E T Jaynes covers why the normal function comes up in this specific case in "Probability Theory: The Logic of Science" – Keith Jan 14 '20 at 23:52
The sample standard deviation is a measure of the deviance of the observed values from the mean, in the same units used to measure the data. Normal distribution, or not.
Specifically it is the square root of the mean squared deviance from the mean.
So the standard deviation tells you how spread out the data are from the mean, regardless of distribution.

- 26,219
- 5
- 78
- 131