How to detect outliers?

Question

I have a matrix where the rows are the data points (samples) and the columns are the features (predictors). Let's say I have 1000 data points and 20 features, i.e. the matrix is of size 1000 x 20.

Now I want to detect and possibly remove outliers. I have read a good introduction: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm

One possibility is for example to use the modified Z-score and remove everything with a value of above 3.5.

First, how should I apply this? Should I just calculate the modified Z-score for each row (data point) of the matrix and removing those rows which are flagged as outliers or should I calculate it for each column (feature)? The same problem I have with making plots (e.g. histograms)...

Second, which outlier detection method is best (possibly also for not normal distributed data)? There are so many. Simply methods like modified Z-score or just looking at the standard deviation seems to be often used.

You are confusing methods for univariate outlier detection and method for multivariate outlier detection. An observation can be a multivariate outliers without outlying in any of the particular variables taken individually. — user603, Jan 12 '16 at 18:17
@user603 ok, which method for univariate and multivariate outlier detection would you recommend? I think modified Z-score is a univariate outlier detection method. Could you briefly explain how I should apply it? — machinery, Jan 12 '16 at 20:47
Have you checked the top answer to [this](http://stats.stackexchange.com/questions/213/what-is-the-best-way-to-identify-outliers-in-multivariate-data) question? — user603, Jan 12 '16 at 23:27

How to detect outliers?

0 Answers0