13

Does anyone know what is the name of this formula?

$$M_i = \displaystyle\frac{0.6745(x_i - \hat{x})}{\mathrm{MAD}}$$

where $\textrm{MAD}$ is the median absolute deviation and $\hat{x}$ is the median of $x$.

Does it appear in some scientific publication? I also wonder where the constant comes from (0.6745 is roughly 29/43). I am using it for outlier detection.

synonym
  • 133
  • 1
  • 1
  • 5
  • 1
    How are you using this for outlier detection? Presumably, you are comparing $M$ to some threshold--and it would be unreasonable to suppose it is equal to $1$. What will matter is the ratio between that threshold and $0.6745$; the actual value of $0.6745$ by itself is not terribly meaningful for this purpose. – whuber Nov 13 '14 at 18:24
  • If the absolute value of $M_i$ is larger than three I flag the observation as an outlier. – synonym Nov 13 '14 at 18:29
  • 1
    It would be practically the same thing just to compute $M_i^\prime = (x_i-\hat x)/\text{MAD}$ and compare it to $2 \approx 3\times 0.6745$. That might be a little bit simpler to explain and interpret, too. – whuber Nov 13 '14 at 18:36
  • B. D. Ripley refers to MAD/0.6745 as the "MAD estimator". See page 2 and 3 of http://www.stats.ox.ac.uk/pub/StatMeth/Robust.pdf. For the normal distrubution. MAD is approximately equal to 0.6745 x the standard deviation – Tony Ladson Nov 17 '14 at 03:34
  • Dead link above, now at: http://web.archive.org/web/20120410072907/http://www.stats.ox.ac.uk/pub/StatMeth/Robust.pdf – Matt Wenham Nov 04 '20 at 21:14

2 Answers2

11

Suppose $x$ follows a standard normal distribution.

The $\mathbf{MAD}$ will converge to the median of the half normal distribution, which is the 75% percentile of a normal distribution, and $\mathbf{N}(0.75) \simeq 0.6745$

Since you are multiplying by $(x-\hat{x})$, this means that, for any normal distribution, your formula will converge to 1 for a large enough sample size.

Arthur B.
  • 2,480
  • 13
  • 19
10

The formula was given by Iglewicz and Hoaglin$^1$ (reference below).

Let the mad for a vector $x$ of $n$ observations be defined as $m(x) = \text{median}(|x- \text{median}(x)|)$. If $x$ is normally distributed, it can be shown that $$ \lim_{n\rightarrow \infty}E(m(x)) = \sigma\Phi^{-1}(0.75) $$ where $\Phi^{-1}(0.75) \approx 0.6745$ is the $0.75^\text{th}$ quantile of the standard normal distribution and is used for consistency. That is, so that $m(x)/0.6745$ is a consistent estimator of the standard deviation $\sigma$.

If you can't assume normality, you can use the 0.75$^\text{th}$ quantile of any other distribution that is symmetric about some value (not necessarly the mean) standardised to have mean 0 and standard deviation 1. Typically a t-distribution is used if fat-tail are assumed.

Iglewicz and Hoaglin suggest using $\pm3.5$ as cut-off value but this a matter of choice ($\pm3$ is also often used).

$^1$ Boris Iglewicz and David Hoaglin (1993), "Volume 16: How to Detect and Handle Outliers", The ASQC Basic References in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
Marco Stamazza
  • 329
  • 3
  • 7