First of all, I would like to note that I have read similar topics in CrossValidated but I am not fully satisfied.
I have a dataset which consists of an $N\times M$ binary matrix. 1 means that an action is performed and 0 that it is not.
I apply PCA to the dataset and surprisingly get very good results, especially when I reduce it to only two dimensions. I am looking for the intuition behind performing PCA on such a dataset (i.e. where each attribute contains categorical data; you can give whatever example you think is more understandable) and whether a more appropriate technique can be applied. I am working with MATLAB and I need the data in a clustering friendly form.