Why is the standard deviation defined using differences^2 instead of differences^4?

Question

I've read some post about why to take the std. deviation and its benefits, here's one of them: Why square the difference instead of taking the absolute value in standard deviation?, and another by itsols (if my memory is right), it's talking about why to square rather keep the absolute value. Standard deviation is defined as $\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2}, {\rm \ \ where\ \ } \mu = \frac{1}{N} \sum_{i=1}^N x_i$. That is, it squared all the variances, so why not use take the fourth order or even higher. so the new deviation is defined:

$$\sigma = \sqrt[4]{\frac{1}{N} \sum_{i=1}^N (x_i - \mu)^4}$$

It would still emphasize relatively larger deviations, and would also be differentiable at any point for a continuous function. Do you have any other evidence that shows the standard deviation definition is better than any other?

I am aware you referenced the duplicate question. However, leaving this question opens up the possibility of an infinite sequence of nearly identical questions: what about the sixth central moment? The seventeenth absolute moment? Etc., etc. These need to be considered inconsequential variants of the original question about *what is special about the second central moment.* — whuber, Feb 26 '15 at 18:04
The answer didn't fully address his question, so he asked a new one... This discussion revolves around the moments, when the answers to the first question hardly mentioned them (not until a ways down the page). _This_ question/answer would address all future moment based question and _those_ questions should be removed as duplicates. — rawkintrevo, Feb 27 '15 at 00:55

rawkintrevo · Accepted Answer · 2015-02-26T18:20:45.193

It has to do with moment generating functions. Specifically, variance is defined to be the second moment about the mean, and the third moment generating function is called the skewness. All of these things are called shape parameters, because they describe the shape of the distribution. The 4th moment, Kurtosis which (loosely) descibes how tall a distribution gets, but that's not exactly what you're doing.

Update- Thank you @amoeba for pointing out my mean formulas were wrong, they should be expected values not sums.

$E [(X)]$ - Mean

$E [(X-\mu)^2]$ - Variance

$E [(X-\mu)^3]$ - Third moment, leads to Skewness

$E [(X-\mu)^4]$ - Forth moment, leads to Kurtosis

and so on...

Update- Also to @amobea's point, skewness and kurtosis have additional calculations that need to be done. However, the 3rd and 4th moment generating functions are correctly listed (now). Henry's answer is much more concise and may provide better insight.

So you can do what you purpose but you'll need to make another name up for it, because standard deviation has already been defined.

To be clear, people started calling the second moment 'variance' and the name stuck. Then someone else took the square root of that, and started calling it standard deviation, and the name stuck. Other people said, "that's a good measure for what I'm trying to use", so they wrote articles/theses/etc. with regards to standard deviation.

To your point, there are other methods for describing the 'spread' of a distribution. Standard deviation has a straight forward interpretations with properties that many are familiar with, especially when dealing the Normal distribution. To say one measure is summarily better than all others in all cases is, in my opinion, inappropriate.

Like everything else in the world, the right tool to use depends on the job you're trying to do, or in this case the right measure to use depends on what question you're trying to answer.

For example, in my line of work people tend to use MAPE, which doesn't describe the distribution at all, and has a number of issues of its own which make it a poor fit for what they are trying to do, but everyone has been doing it for a while, so that's what will likely continue happening for the foreseeable future. That has more to do with human nature than statistics, but also is somewhat applicable to your question (and maybe the best answer).

One final point: if you're going to do a sum, you need to muliply each x by the probability of x

$$ E[(X-\mu)^4] = \sum_{x\in D}^{ } (x-\mu)^4*p(x)$$

Your 1/N is only valid if each value of x is equally likely (a.k.a. the distribution is uniform).

-1. All your four formulas in the beginning are wrong! And for several different reasons. The mean is not a central moment! Your formula for the mean simply gives zero! Kurtosis and skewness are standardized moments; you should normalize by standard deviation. And there should probably either be expected values instead of the sums, or $1/n$ factor everywhere? — amoeba, Feb 26 '15 at 11:29
But the formulas are still not correct! They give the *central moments*, **not** skewness and kurtosis. — whuber, Feb 26 '15 at 18:02
Which I stated in the update. However I will updated again for clarity. — rawkintrevo, Feb 26 '15 at 18:19
To me now the name of those things are not so important, and I've referenced wikipedia, so I'm thinking I've got the right point. Thank you all buddies. — YiFei, Feb 27 '15 at 00:28

score 4 · Answer 2 · answered Feb 26 '15 at 07:54

The mean is in a sense the natural partner of the standard deviation: if you want to minimise $\sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - m)^2}$ then this is achieved when $m = \frac{1}{N} \sum_{i=1}^N x_i$. For example, if you have $x_1=1, x_2=6, x_3=2$, then the minimum comes when $m=3$.

This is not true when you have other measures of deviation from the central estimate. The expression ${\frac{1}{N} \sum_{i=1}^N |x_i - m|}$ is minimised by a median, in this example when $m=2$.

Finding $m$ to minimise $\sqrt[4]{\frac{1}{N} \sum_{i=1}^N (x_i - m)^4}$ is harder as it involves solving a cubic equation. In this example it is minimised when $m \approx 3.423$.

So if you want your central estimate to minimise your chosen deviation measure, and you want your central estimate to be the mean, then the natural deviation measure is going to be the standard deviation or some monotonic function of it such as the variance.

Why is the standard deviation defined using differences^2 instead of differences^4?

2 Answers2