I am looking to clean up a large data set (~300) with a large number of features (~140). I would like to explore outliers in the data. My first thought was to use PCA to reduce these features to a few components that explain most of the variance, and then exclude outliers of these new n factors.
However, there are a few issues with this approach. The first is that PCA is affected by outliers itself. Robust PCA may be a good alternative here?
The second issue is that some of the variables I would like to partial out, like age, sex, etc..., but not remove data based on these. I would like to control for them first, and then apply some outlier detection on the result. Does it makes sense to partial these out first, and then apply a robust PCA outlier analysis on the results of that?