Cluster analysis of binary data

Question

I have 2000 questionnaires from respondents which ask 33 different questions about which issues are present in their lives - i.e. alcohol abuse, domestic violence, mental health, child abuse, learning difficulties etc.

Each question can only be answered yes/no (which I've re-coded as 1/0).

I'd like to use this dataset to start creating n profiles of respondents via which variables naturally cluster together e.g. (alcohol abuse and domestic violence), (mental health, child abuse, domestic violence), (alcohol abuse, learning difficulties) across some/all of the 33 differnt variables.

Is two-step clustering in SPSS the most suitable method of analysing these data to identify say, 5 key cluters?

Thanks! :)

You can find posts about `two-step cluster analysis` of SPSS on this site. So please search. This method is not quite suitable for binary data: it can take quantitative and nominal variables. Binary variables are, by their nature, in-between. With two-step procedure, you have to decide whether to regard the binary data as quantitative or as nominal. Hierarchical cluster analysis is much more apt for binary data. — ttnphns, Sep 26 '14 at 09:08
Hi ttnphns, Thanks for your reply. I'm new to cluster analysis and have come across conflicting responses (on this site and the web at large) for whether hierarchical cluster analysis is suitable for large binary datasets - particularly due to the order in which data is loaded into the dataset. Would binary 1/0 data be better analysed in its original format (y/n) for cluster analysis? Thanks so much. — user27768, Sep 26 '14 at 09:27
I couldn't get your last sentence. What I would recommend you is to read a bit more on cluster analysis. And if you already have specific questions about hierarchical clustering, go ahead to ask. — ttnphns, Sep 26 '14 at 09:34
Thanks again - apologies for not being more clear. Yes, plenty more reading to do! I found this [link](http://www-01.ibm.com/support/docview.wss?uid=swg21476716) to be particularly specific about why to use two-step cluster analysis for Binary data so I'd be really interested to get more of an understanding of why hierarchial cluster anlaysis is more appropriate? Thanks so much! :) — user27768, Sep 26 '14 at 09:40
The case you link too may be worth to consider here. I recommend you either to post a separate question about it or re-edit your current question. You may ask - why they recommend so (avoid hierarchical and use two-step), while many people use to think that hierarchical cluster is alright with binary data. — ttnphns, Sep 26 '14 at 09:51
Thanks - [re-asked:](http://stats.stackexchange.com/questions/116856/hierarchical-or-two-step-cluster-analysis-for-binary-data) — user27768, Sep 26 '14 at 10:10

Cluster analysis of binary data

0 Answers0

Linked