1

I have some issues in data reduction, and one expert advised me to remove the outliers and then move to Factor Analysis.

I want to remove outliers together, as I have 61 items, and box plots are not helpful as they would indicate outliers item by item. How can I detect the outliers at one time?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
user22442
  • 11
  • 2
  • 1
    What is your sample size? What type of data are the items (e.g. ordinal, continuous, nominal)? Have you searched the site for this sort of question? Finding outliers in 61-dimensional space will run into the curse of dimensionality in a big way. – Peter Flom Mar 24 '13 at 12:04
  • 1
    @PeterFlom: yes. But often with high dimensional data, most of the variance is concentrated in a few component (in the PCA sense). Outlier detection algorithms have been developed which explicitly aim to take advantage of this. These have numerical complexity polynomial in $k$, the number of component (instead of $p$; the size of the original space). Of course, all this hinges on the user being able to set a upper bound on $k$. – user603 Mar 24 '13 at 12:39
  • However, what is an outlier is tightly bound to your application. So you should try and formulate criteria for inclusion or exclusion of data points. A good and general automagic outlier detection cannot exist: your outliers may be the precious rare events I study... – cbeleites unhappy with SX Mar 24 '13 at 17:05
  • @cbeleites: why is it so hard to accept this? Even if you want to study the outliers you still need to identify them reliably and for that you need a principled [outlier identification method](http://stats.stackexchange.com/a/50780/603). – user603 Apr 17 '13 at 21:13
  • @user603: what do you mean by accept? I agree that you reliably want to know what groups you have in your data. All I'm saying is that the decision whether to remove certain groups of data or not should be discussed with respect to the application (as should be the choice of model). – cbeleites unhappy with SX Apr 18 '13 at 07:59
  • @cbeleites: then I miss-understood your comment. – user603 Apr 18 '13 at 08:00

1 Answers1

1

To make a long story short, you should use a tool such as robust PCA analysis. I may come back to this with a more substantive post, but the short version is explained in this post

user603
  • 21,225
  • 3
  • 71
  • 135