1

Suppose I have a customer satisfaction survey with around 20 questions, asking customer's opinions on Company X.

For each question and each company, the customer rates the company on a 1-5 scale, so my data looks something like:

               Q1   Q2    Q3    ...   Q20
Customer 1      4    4     3    ...     2
Customer 2      4    4     3    ...     2
...                 
Customer 10000  2    1     3    ...     5

I would then like to cluster customers based on their answers (the patterns of their answers).

What type of clustering would be appropriate in this case? I don't know the number of clusters I am looking for.

What if each customer rated more than one company? (so they completed more than one survey)

Grint
  • 345
  • 1
  • 9
  • 1
    do you consider the responses on the 1-5 scale to be numeric or categorical? In other words, is a 4 "closer" to a 5 than a 1, or they're both equally different categories. – Chris Umphlett Apr 18 '19 at 15:06
  • @ChrisUmphlett they are considered to be numeric (1 < 2 < 3 < 4 < 5) – Grint Apr 18 '19 at 15:21
  • I shouldn't have made the choice binary... I'd say that makes them ordinal then, not necessarily numeric (not continuous). If someone says 2, does that make them twice as satisfied as someone who says 1? Probably not. – Chris Umphlett Apr 18 '19 at 15:23
  • @ChrisUmphlett I guess you're right... (I found this https://inseaddataanalytics.github.io/INSEADAnalytics/CourseSessions/Sessions45/ClusterAnalysisReading.html#step_3:_select_segmentation_variables, not sure if it would be appropriate?) – Grint Apr 18 '19 at 15:27
  • 1
    here are some similar questions on CV: https://stats.stackexchange.com/questions/71490/cluster-analysis-on-ordinal-data, https://stats.stackexchange.com/questions/56479/cluster-analysis-on-ordinal-data-likert-scale. Looks like there are many options, this will give you some better phrases to search for and some links to get started with. – Chris Umphlett Apr 18 '19 at 15:27
  • @ChrisUmphlett thanks! Is it advised to reduce the dimensionality of the data before clustering? – Grint Apr 18 '19 at 16:18
  • It's always a good thing to consider for any modeling. I'm not an expert on clustering. If you use a cluster algorithm like k-means, I don't think you would remove dimensions in your case because that would mean removing a question. Some of the other things mentioned in those links, if it utilized other information, you may want to consider. – Chris Umphlett Apr 18 '19 at 16:26

0 Answers0