1

First of all, I would like to note that I have read similar topics in CrossValidated but I am not fully satisfied.

I have a dataset which consists of an $N\times M$ binary matrix. 1 means that an action is performed and 0 that it is not.

I apply PCA to the dataset and surprisingly get very good results, especially when I reduce it to only two dimensions. I am looking for the intuition behind performing PCA on such a dataset (i.e. where each attribute contains categorical data; you can give whatever example you think is more understandable) and whether a more appropriate technique can be applied. I am working with MATLAB and I need the data in a clustering friendly form.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
JustCurious
  • 239
  • 1
  • 3
  • 9
  • 3
    Strictly speaking, one cannot apply PCA to categorical data until they have been *numerically coded.* You will get different results if you encode your binary values differently. But once the values are encoded, you have--a numeric dataset. Period. There is *nothing* in any of the accounts of PCA (that I have ever read) that asserts those numeric data must be of a certain type or have certain distributions. PCA is a pithy *summary* of the locations of those numbers when they are considered to be points in a Euclidean space. From this point of view, there's nothing different about your data. – whuber Jul 19 '13 at 17:56
  • 1
    @JustCurious, I personally find no sin in doing PCA on binary data, - if you wish to know my [opinion](http://stats.stackexchange.com/a/16335/3277). What do you mean saying `I need the data in a clustering friendly form`, how is this connected with PCA? – ttnphns Jul 19 '13 at 18:31
  • 1
    With binary data (1=present 0=absent) one might want to set the origin of components at "all variables = absent" point. This will be PCA of the matrix of cosines, rather than of the matrix of correlations. – ttnphns Jul 19 '13 at 18:43
  • @ttnphns PCA reduces the dimensions and change the values into something that e.g Euclidean distance can be applied on. So, clustering techniques can be applies when data are transformed using PCA. – JustCurious Jul 20 '13 at 12:58
  • @ttnphns I have read your response to the other question but I am still not convinced or I am a bit confused if you like. Are there any other techniques or PCA adjusted for binary data? – JustCurious Jul 20 '13 at 13:01
  • What bothers you and in what way you expect PCA to adjust for binary data? Explicate it in your question, please. – ttnphns Jul 20 '13 at 13:41
  • I am trying to understand why it works first. In an intuitive way. For example, having a matrix [1 0 0; 0 0 1] will give a covariance matrix of [ 0.5000 0 -0.5000 ;0 0 0; -0.5000 0 0.5000]. In terms of actions executed this makes no sense, and thus I am wondering if there is a better representation that will both achieve the same result as PCA but which will also have an intuitive meaning. I hope I was clear. To clarify, if you are able to explain PCA in any context with categorical data as an example I might be able to understand it better – JustCurious Jul 20 '13 at 14:50

0 Answers0