1

I have to analyse data from a marketing study. I will use SPSS. The questionnaire will look like this:

Q: Imagine Situation X. Select 1-3 Criteria from the list that best describe your feeling.

  • C1
  • C2
  • C3
  • C4
  • C5
  • C6
  • C7
  • ...

I want to perform cluster analysis to find out which respondents have similar feelings and what are the most often selected feelings in each cluster. The output is basically a set of binary codes (present vs absent). The categories are asymmetric: In other words, a 0-0 match should not necessarily be considered similar.

From this incredibly helpful post, I understood, that hierarchical cluster analysis, using a dice measure should work in my case. However, I also understood that it is not suitable for a large number of samples due to performance issues. (My sample size is 1500.)

Questions:

  1. Is my way of thinking correct?
  2. Would you still recommend using hierarchical clustering?
  3. If no, what else would you recommend?
  4. If yes, how bad will the performance be / how long will it take SPSS to run this? (I don't have the dataset yet, so I can't try it out.)
Lakai
  • 11
  • 1
  • 1
    You seem having partly misunderstood the links. Dice measure suggests itself when you have nominal data such as single-choice question (and so the binary variables from it will be the dummies). In your place, I see a multiple choice question leaving us a set of binary variables. They are of "ordinal" (selected vs not selected) sense. Measures such as Jaccard or Ochiai (cosine) will do. – ttnphns Feb 03 '16 at 20:35
  • 1
    You can do hierarchical clustering (for example complete or average linkage): 1500 objects is not too much (SPSS will process it in few seconds). But given that some greedy algos [may become riskyly suboptimal](http://stats.stackexchange.com/a/63549/3277) with thousands of objects you might consider doing the analysis [on halfs of the sample](http://stats.stackexchange.com/q/189012/3277), and compare the results. – ttnphns Feb 03 '16 at 20:38
  • Thanks a lot for your explanations and remarks! Reading the replies to the other questions again, I see that i probably missed some points. Anyway, I am glad that the analysis should work. – Lakai Feb 04 '16 at 07:49

0 Answers0