I need to fit a generalized Gaussian distribution to a 7-dim cloud of points containing quite a significant number of outliers with high leverage. Do you know any good R package for this job?
-
3You will find links to at least four R packages for identifying multivariate outliers in the replies to a similar question at http://stats.stackexchange.com/questions/213/what-is-the-best-way-to-identify-outliers-in-multivariate-data. That might be a good start. – whuber Jul 07 '11 at 13:51
-
Maybe the question is eluding me, but as far as fitting a multivariate Gaussian distribution, why not just use the empirical mean and SD as the MLE? You can then focus on diagnostic statistics if there are high influence/leverage points. – AdamO Feb 09 '18 at 15:22
-
I think the question is about using something like a Huberized loss function to estimate the parameters. I'm not an expert, but perhaps using Huber loss to fit the mean would be a start. – Tom Dietterich Apr 20 '20 at 22:00
2 Answers
This sounds like a classic multivariate Gaussian Mixture Model. I think that the BayesM package might work.
Here are some multivariate Gaussian Mixture packages
- bayesm: cran.r-project.org/web/packages/bayesm/index.html
- mixtools: www.jstatsoft.org/v32/i06/paper

- 8,232
- 2
- 29
- 82
There's also mclust: http://www.stat.washington.edu/research/reports/2012/tr597.pdf http://cran.r-project.org/web/packages/mclust/index.html
One caution, though: mixture modelling in high dimensional space can get pretty CPU and memory intensive if your cloud of points is large. About four years ago I was doing a batch of 11-dimensional, 50-200K point data, and it was tending to run into 4-11GB of RAM and take up to a week to compute for each case (and I had 400). This is certainly possible, but can be a headache if you're using a shared compute cluster or have limited resources available.

- 156
- 2