0

I have results from a survey:

ID    Ques    Answer
1      a        yes
1      b        yes
1      c        no
2      a        no
2      b        yes
2      c        no
3      a        no
3      b        no
3      c        no

I would like to see if there are any relationships/clusters between answering yes to the questions. I have data for around 2000 participants, with around 30 questions each. I'm wondering if I should construct a distance matrix, by converting the yes/no to 0/1, follow this previous question. Any idea if I could reproduce similar methods in SAS or Python?

***EDIT: I think I'm trying to cluster questions, as opposed to people. I would like to see if a group of questions are often answered "yes" together, or the opposite, that answering "yes" to a, is predictive of answering no to b.

Yolo_chicken
  • 121
  • 3

1 Answers1

1

What you are describing sounds more like frequent itemset mining to me. Because not all questions will cluster.

Questions are your items, each questionnaire is a transaction. You'll need fairly high thresholds, since each user answers each question.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96