How to normalize if MAD equals zero?

Question

A known way to normalize our feature vectors is:

$$\frac{x_i - \operatorname{median}( X^{(j)})}{\operatorname{MAD}^{(j)}},$$

where $\operatorname{MAD}^{(j)}$ is the median absolute deviation of feature $j$.

However, if more than $50\%$ of a feature vector's data have the same value, we have: $\operatorname{MAD}^{(j)}=0$.

Is there an appropriate approach to handle this problem? Or, should I just use another method?

Is there some reason why you can't use the classic correction for mean and standard deviation instead for those features? — EdM, Mar 05 '17 at 16:37
Both $S_n$ and $Q_n$ in the paper to which you linked also break down if more than 50% of data have the same value, as will the inter-quartile range. — EdM, Mar 05 '17 at 16:49
@EdM #1 I am just exploring methods that seem more robust than standardization, as mean & std can easily be influenced by strong outliers. #2 You are correct, I should have put more thought on those methods before referencing them. - The 50% breakdown seem like a major disadvantage to me. I guess I should continue using the usual standardization method. — Low Yield Bond, Mar 05 '17 at 17:21
As a measure of spread, MAD is robust to to up to 50% outliers, but not any more robust to inliers, and having a large concentration at the median (more than half "inliers") -- that's a fairly common thing. So it can be a bit dangerous for standardizing if you're considering cases like that. If your distribution is either stuck at the median or too far away to be informative (outliers) there's not going to be any very good solution. — Glen_b, Mar 06 '17 at 05:28

score 4 · Answer 1 · edited Apr 13 '17 at 12:44

Normalization is not at all straightforward, as this question indicates. Consider small numbers of large outliers. Even though they don't contribute to MAD, their final values normalized by MAD/median will be very high in absolute values, probably higher than their final values would be had you normalized by SD/mean. If you are trying to get all your features on a common scale for, say, fair relative penalization in ridge regression, LASSO, or penalized maximum likelihood, even that choice of normalization will affect the results.

In your case with more than 50% identical values, none of the usual candidates for robust measures of scale will work as they all break down in that case. Like MAD, the $S_n$ and $Q_n$ measures developed in the paper you cite break down at 50%. I suppose you could try to use different order statistics than the median in some way, but then you are going back toward the measure of scale being dominated by outliers.

One thing that came to mind (against usual advice) is binning such features to treat them as ordinal variables. In this case binning might not be so bad, if the main interest is whether or not the feature value differs from that single highly prevalent value and, if so, in which direction. That changes this problem into another difficult problem, however, which is how best to normalize an ordinal variable. This page, this page and this page provide entries to the discussion.

It seems that knowledge of the underlying subject matter and what you are trying to accomplish with normalization, rather than a simple algorithm, might provide the best answer to your question.

score 3 · Accepted Answer · answered Apr 22 '18 at 18:09

If at least 50% of your observations are identical then yes, this normalisation operation wouldn’t make sense mathematically as well as intuitively.

I would probably consider binning the observations as suggested before. For instance, all observations with the same value will be labelled group 1 and everything else group 2.

That being said, if you really want to maintain the numerical nature of the feature then you could try various transformations first (such as the log, square root etc) to minimise the effect of the outliers and then normalise using the traditional methods.

score 0 · Answer 3 · answered Nov 27 '19 at 05:10

0

Another way might be to induce some deviance on the 50% constant values. For example, if 50% of the values are 1, generate random values with range from 0.9999 to 1.0001 for constant values

answered Nov 27 '19 at 05:10

Azam Yahya

101

How to normalize if MAD equals zero?

3 Answers3