14

I have a 1-D random variable which is extremely skewed. In order to normalize this distribution, I want to use the median rather than the mean. my question is this: can I calculate the variance of the distribution using the median in the formula instead of the mean?

i.e. can I replace

$$ \mathrm{Var}(X) = \sum[(X_i - \mathrm{mean}(X))^2]/n $$

with

$$ \mathrm{Var}(X) = \sum[(X_i - \mathrm{median}(X))^2]/n $$

My reasoning behind this is that since variance is a measure of spread w.r.t. the central tendency of a distribution, it shouldn't be a problem but I'm looking to validate this logic.

Tim
  • 108,699
  • 20
  • 212
  • 390
Rahul Singh
  • 143
  • 1
  • 6
  • 2
    See https://en.wikipedia.org/wiki/Median_absolute_deviation – Tim Oct 15 '15 at 12:43
  • 1
    By median centering your variables and then dividing that by the MAD (median absolute deviation), you can create a median standardized distribution. – Mike Hunter Oct 15 '15 at 12:47
  • 5
    You can do this! But I think it's fair to call it highly non-standard and to suggest that you need theory and/or simulations to back it up and not just your intuition. I suspect that it will be **less resistant** than the standard estimator. For example, in a common right-skewed case, the median will be less than the mean, so the largest squared deviations (from the median) will therefore be even larger! The major point is that if the variance is very untrustworthy, you may need to think about measuring spread quite differently, rather than different versions of the variance. – Nick Cox Oct 15 '15 at 12:53
  • 1
    Orthogonal point: Does "normalise" mean scale in some way, e.g. (value $-$ location) / scale, or does it mean make closer to normal (Gaussian)? – Nick Cox Oct 15 '15 at 12:56
  • 1
    This approach is inherently inconsistent, because the problems that are addressed by replacing the mean by the median are magnified by using the variance instead of a robust estimator of the spread. – whuber Oct 15 '15 at 14:21
  • You might be interested in [L-moments](https://www.wikiwand.com/en/L-moment) – Dan Oct 15 '15 at 15:24
  • 1
    @NickCox Hampel's article in Tukey's mid-70s book on Exploratory Data Analysis may have initiated these approaches. I think you'll agree that there are lots of ways to rescale skewed, heavy-tailed information including using the range, IQR, ipsative scaling, and so on, many of them are simple heuristics for which theory provides little guidance. Isn't it true that there are lots of rules of thumb and widely agreed upon conventions in statistical analysis that are lacking in theoretical motivation? In other words and unless the OP is doing a dissertation, I question the need for theory. – Mike Hunter Oct 16 '15 at 12:35
  • I don't recognise your reference. Perhaps you mean the 1972 book on robustness by Andrews, Bickel, Hampel, Huber, Rogers and Tukey which was a massive theory and simulation-based study. More crucially, I doubt that Hampel would suggest anything precisely like this. I still think it's fair to suggest that you need more than intuition to take this measure seriously; the analysis need not be elaborate. The arguments for using IQR rather than SD for example are of the same kind. – Nick Cox Oct 16 '15 at 12:45
  • @Nick Cox: You're right. I was using normalize in the scaling context and not the 'make closer to Gaussian' context. (apologies for the mix up in terminologies). With regards to what you said, could you elaborate a bit on how exactly you would go about developing this theory/what exactly would you simulate? Reason I ask is that I've never really seen a 'theory' for why variance is defined the way it is. I just assumed it intuitively seemed the most logical measure to calculate. – Rahul Singh Oct 16 '15 at 13:15
  • You mainly need systematic exploration of how your measure would behave in realistic situations, e.g. by simulating from distributions similar to those you expect. – Nick Cox Oct 16 '15 at 14:39

1 Answers1

12

Mean minimizes the squared error (or L2 norm, see here or here), so natural choice for variance to measure distance from the mean is to use squared error (see here on why we square it). On the other hand, median minimizes the absolute error (L1 norm), i.e. it is a value that is in the "middle" of your data, so absolute distance from the median (so called Median Absolute Deviation or MAD) seems to be a better measure of the degree of variability around the median. You can read more about this relations in this thread.

Saying it short, variance differs from MAD on how do they define the central point of your data and this influences the way how we measure variation of datapoints around it. Squaring the values make outliers have greater influence on the central point (mean), while in case of median, all the points have the same impact on it, so the absolute distance seems more appropriate.

This can be shown also by simple simulation. If you compare values squared distances from the mean and median, then the total squared distance is almost always smaller from mean than from median. On the other hand, total absolute distance is smaller from median, then from mean. The R code for conducting the simulation is posted below.

sqtest  <- function(x) sum((x-mean(x))^2)  < sum((x-median(x))^2)
abstest <- function(x) sum(abs(x-mean(x))) > sum(abs(x-median(x)))

mean(replicate(1000, sqtest(rnorm(1000))))
mean(replicate(1000, abstest(rnorm(1000))))

mean(replicate(1000, sqtest(rexp(1000))))
mean(replicate(1000, abstest(rexp(1000))))

mean(replicate(1000, sqtest(runif(1000))))
mean(replicate(1000, abstest(runif(1000))))

In the case of using median instead of mean in estimating such "variance" this would lead to higher estimates, than with using mean as it is done traditionally.

By the way, the relations of L1 and L2 norms can be considered also in the Bayesian context, as in this thread.

Tim
  • 108,699
  • 20
  • 212
  • 390