Intuitive explanation of Minimum Covariance Determinant (MCD)

Question

I am an undergrad working on Anomaly Detection on an 8 dimensional dataset, with PYOD, which relies on the MCD in the sklearn's MinCovDet. I tried reading Minimum Covariance Determinant and Extensions, but am not grasping it.

What is the Minimum Covariance Determinant?

score 9 · Accepted Answer · answered Jul 05 '20 at 23:55

One way to detect anomalies is to assume that regular (non-anomalous) data are generated by a particular probability distribution, and to declare points with low probability density as anomalies. For ellipitically distributed (e.g. Gaussian) data, this can be done by computing the Mahalanobis distance from each point to the mean, and defining anomalies as points with with distance above some threshold. The Mahalanobis distance requires the parameters of the distribution (mean and covariance matrix). Since these are unknown, they must be estimated from the data.

A problem arises here because anomalies in the data can distort the parameter estimates, with the effect of making these points appear less anomalous than they really are. For example, distant outliers will pull the ordinary sample mean toward themselves, and artificially inflate the ordinary sample covariance matrix. If we knew a priori which points were anomalous, we could simply exclude them when estimating the parameters. But, this information is often unavailable.

MCD is a method for estimating the mean and covariance matrix in a way that tries to minimize the influence of anomalies. The idea is to estimate these parameters from a subset of the data that has been chosen to (hopefully) not contain anomalies.

More specifically, imagine taking all possible subsets of the data, of a specified size. Estimate the mean and covariance matrix for each subset. Then, keep the estimates for the subset whose covariance matrix has the smallest determinant. The chosen covariance matrix is finally multiplied by a 'consistency factor'.

The idea behind minimizing the determinant is that the determinant of a covariance matrix measures how broad the distribution is. MCD therefore selects the subset of the data that is most tightly distributed. This is to exclude anomalies, which are likely to lie further away from the rest of the data (e.g. see figure 1 in the paper).

In practice, one can't actually perform a brute force search over all possible subsets of the data, because there are too many. So, practical MCD algorithms are concerned with how to perform this procedure in a computationally efficient way.

Intuitive explanation of Minimum Covariance Determinant (MCD)

1 Answers1

Linked