Why do we use squared deviations to compute the SD, given that it amplifies the effect of outliers?

Question

Suppose I have the following hypothetical data:

One thousand times value 15 (i.e., 15 occurs 1000 times)
and a single outlier value - 115 (i.e., 115 occurs just once - an outlier)

Thus the mean is: $$((15 \times 1000) + 115)/1001 \Rightarrow (15000 + 115) / 1001 \Rightarrow 15.1$$

The standard deviation is: $3.16$.

Whereas, $Σ|x - x̄| / N$ (i.e., using the absolute deviation instead of squaring) I get: $0.20$.

I think the later value is more appropriate, but I have read in many places that one of the reasons we take the square for calculating the SD is because we want to give more weight to outliers. But I have always thought that outliers are supposed to be ignored!

What is the benefit of using the standard deviation formula (with squared deviations) which amplifies the effect of outliers (as compared to using absolute deviations)?

1. I did not downvote your question; however, in its current state it's not surprising to me that it would receive downvotes. 2. Since you've accused me of acting improperly, I will leave the remainder of any dealing with this question to others. 3. If you have any substantive complaint to make about our policies or my actions, the proper avenue is in our *meta* rather than comments. 4. I encourage you to avoid making your comments personal; you risk contravening (network-wide) policies. — Glen_b, Feb 27 '19 at 22:13
1. The above post was asking question before my adding the last paragraph. This was bogging me from a long time, as I searched all over but it was contradicting to the logic that outliers should be removed from data before analysis. Still you put on hold and you haven't removed the hold even after I edited my question and added the last para which just present the hidden question in my initial post more clear. I am new to this area, and I have been struggling to get the reason behind the behavior and I know this website is meant to solve such doubts. — VISHAL DAGA, Feb 27 '19 at 22:30
Although I can see that your question has something to do with standard deviation and identifying outliers, that's as far as I can get. Are you trying to ask what are some good (or at least standard) ways to detect outliers? About the sensitivity of standard deviation calculations to outliers? Something else? Whatever your aim is, please clarify it by *editing your post.* No amount of complaining about moderation is going to make it any clearer! — whuber, Feb 28 '19 at 00:02
Based on my best guess about what you are asking, the answer could be learned from this thread: [Why square the difference instead of taking the absolute value in standard deviation?](https://stats.stackexchange.com/q/118/), possibly in conjunction with [Rigorous definition of an outlier?](https://stats.stackexchange.com/q/7155/7290) — gung - Reinstate Monica, Feb 28 '19 at 03:18
*"we take square...to give more weightage to outliers"* This is not correct. The use of the square in calculating variance is almost never because of giving more weight to outliers (it might, however, be an intuitive way to think of it). The type of cost function, squaring, absolute difference or something else, should depend on the type of distribution that one assumes/expects/uses for the error distribution. Due to how errors are created (sum of little incremental errors), the errors often follow a normal distribution or something resembling. **That** is the reason for using the square. — Sextus Empiricus, Feb 28 '19 at 13:33
So in your hypothetical case, the use of the square might be very well inappropriate. But it depends a lot on what the data represents and what kind of statistical model would be appropriate to model it. The question is not clear about that. — Sextus Empiricus, Feb 28 '19 at 13:34
@gung I think the question is perfectly clear (probably after it was edited by the OP). +1. I edited it a little bit for formatting and English and voted to reopen. — amoeba, Mar 01 '19 at 14:04
I agree that the question is clear enough now, and I support its being reopened on those grounds. However, I now also think that it is pretty much an exact duplicate of the question linked to by gung. I feel a little sorry to say it because the conflict this question inspired would have had a nicer resolution if this wasn't the case. But should the question be closed again (now as a duplicate), then I hope the OP will be happy to see that the older question has a very high number of upvotes (and useful answers), so obviously the question itself was a good one. — Ruben van Bergen, Mar 01 '19 at 14:46

Sextus Empiricus · Answer 1 · 2019-03-01T14:58:47.097

I have read in many places that one of the reasons we take the square for calculating the SD is because we want to give more weight to outliers.

This is not correct. The use of the square in calculating variance is almost never because of giving more weight to outliers

It might, however, be an intuitive way to think of it. I believe people more often say that the square gives more weight to values further away from the mean. But this is not to be considered outliers.

The type of cost function to apply to the difference, squaring, absolute difference or something else, should depend on the type of distribution that one assumes/expects/uses for the error distribution.

Due to how errors are often created (as a sum of many little incremental errors), the errors often follow a normal distribution or something resembling. That is the more common reason for using the square

The sample standard deviation $s = \sqrt{\sum (x_i-\bar{x})^2/(n-1)}$, based on the squared difference, is an efficient estimator for the parameter $\sigma$ in the normal distribution (and also the minimum variance unbiased estimator).

See Fisher's 1920 article A Mathematical Examination of the Methods of Determining the Accuracy of an Observation by the Mean Error, and by the Mean Square Error.
The squared difference, or variance, as a measure of te deviation has also the advantage that it is additive. $Var(X_1+X_2) = Var(X_1) + Var(X_2)$.

See Fisher's 1918 article The Correlation Between Relatives on the Supposition of Mendelian Inheritance

So in your hypothetical case, with the thousand values of 15 and one of 115, the use of the square might be very well inappropriate. But... it depends a lot on what the data represents and what kind of statistical model would be appropriate to model it. The question is not clear about that.

Why do we use squared deviations to compute the SD, given that it amplifies the effect of outliers?

1 Answers1