Robust multivariate Gaussian fit in R

Question

I need to fit a generalized Gaussian distribution to a 7-dim cloud of points containing quite a significant number of outliers with high leverage. Do you know any good R package for this job?

You will find links to at least four R packages for identifying multivariate outliers in the replies to a similar question at http://stats.stackexchange.com/questions/213/what-is-the-best-way-to-identify-outliers-in-multivariate-data. That might be a good start. — whuber, Jul 07 '11 at 13:51
Maybe the question is eluding me, but as far as fitting a multivariate Gaussian distribution, why not just use the empirical mean and SD as the MLE? You can then focus on diagnostic statistics if there are high influence/leverage points. — AdamO, Feb 09 '18 at 15:22
I think the question is about using something like a Huberized loss function to estimate the parameters. I'm not an expert, but perhaps using Huber loss to fit the mean would be a start. — Tom Dietterich, Apr 20 '20 at 22:00

EngrStudent · Answer 1 · 2013-08-08T16:52:36.280

1

This sounds like a classic multivariate Gaussian Mixture Model. I think that the BayesM package might work.

Here are some multivariate Gaussian Mixture packages

bayesm: cran.r-project.org/web/packages/bayesm/index.html
mixtools: www.jstatsoft.org/v32/i06/paper

edited Aug 08 '13 at 16:52

answered Mar 24 '13 at 23:17

EngrStudent

8,232
2
29
82

score 1 · Answer 2 · answered Aug 08 '13 at 01:28

There's also mclust: http://www.stat.washington.edu/research/reports/2012/tr597.pdf http://cran.r-project.org/web/packages/mclust/index.html

One caution, though: mixture modelling in high dimensional space can get pretty CPU and memory intensive if your cloud of points is large. About four years ago I was doing a batch of 11-dimensional, 50-200K point data, and it was tending to run into 4-11GB of RAM and take up to a week to compute for each case (and I had 400). This is certainly possible, but can be a headache if you're using a shared compute cluster or have limited resources available.

Robust multivariate Gaussian fit in R

2 Answers2