3

I'd like to perform a cluster analysis on ordinal data (Likert scale) by using SPSS. I have around 140 observations and 20 variables that are scaled from 1 to 5 (1: I strongly agree, 3: neutral, 5: I strongly disagree). As a result, I want to assign one cluster to each person, such as person 1 belongs to the group of technology-enthusiastic users...

Since I'm new in that area, I want to ask you, if you might help me with regard to the procedure. I have already read something about cluster analysis in the Internet, but there are different opinions on how to treat ordinal data.

Is it a common way to convert the ordinal data to interval data first and then perform the cluster analysis?

ttnphns
  • 51,648
  • 40
  • 253
  • 462
cathy
  • 145
  • 1
  • 3
  • 9
  • It is very common to treat Likert-type rating scale as interval. (Suffice it to say, if sum your items into a total score or you compute mean for an item - be sure you've already decided that the data are interval.) With interval data, many kinds of cluster analysis are at your disposal. If you insist the data are ordinal - ok, use hierarchical cluster based on Gower similarity. Find an SPSS macro for Gower similarity on my web-page. – ttnphns Sep 30 '13 at 17:18
  • Indeed, treating such Likert scales as metric is called making [the assumption of equal intervals](http://www.pythonforspss.org/assumption-of-equal-intervals/). But with 20 separate items I'd first run a **PCA** to decrease the number of items. In my experience (market research), between 4 and 6 variables works best in cluster analysis. – RubenGeert Sep 30 '13 at 17:45
  • Thanks a lot for your answer! I have another question: how to convert ordinal data to interval data in spss? Can I use "optimal scaling" or is there a better solution? – cathy Sep 30 '13 at 17:48
  • Thank you, RubenGeert. By "treating such Likert scales as metric", you mean I can set the level of measurement for my variables as metric in spss? What kind of cluster analysis (hierachical,two-step) is the best, after having run a PCA? – cathy Sep 30 '13 at 18:06
  • You can use twostep cluster. Try it both with the scales as categorical and as continuous to see how much difference it makes. They will, of course, use different distance measures. – JKP Sep 30 '13 at 20:32
  • Thanks all for your answer. However, I'm still not quite sure What you mean with "treat Likert-type rating scale as interval"? Should I convert them by using e.g. optimal scaling or set the level of measurement for my variables as metric (instead of ordinal) in spss? Thanks! – cathy Oct 01 '13 at 04:07
  • "treat Likert-type rating scale as interval" is just think that it is interval and use it as interval. Yes, it implies that you set measurement level to Scale. – ttnphns Oct 01 '13 at 07:54
  • [Changing measurement levels](http://www.pythonforspss.org/managing-variable-properties-6-missing-values-and-more/#level) does **not** change anything about the nature of your variables. It's mostly just a description that you may or may not add to your data. Making [the assumption of equal intervals](http://www.pythonforspss.org/assumption-of-equal-intervals/) is basically just pretending that variables are something that they really aren't for practical purposes. – RubenGeert Oct 01 '13 at 16:41
  • Why don’t you use a finite mixture model for ordinal data? https://upcommons.upc.edu/bitstream/handle/2117/329799/mixture-based%20clustering.pdf?sequence=5 – Daniel Fernandez-Martinez Jul 27 '21 at 21:53

1 Answers1

5

If you can define a reasonable similarity measure on the values, you can use any distance based algorithm, such as:

  • Hierarchical clustering
  • DBSCAN
  • OPTICS
  • K-Medoids (k-means for arbitrary distances)

Given that you only have 5 values, you could just manually define a similarity matrix for these 5 values; then decide on a combination rule to merge multiple attributes, e.g. mean.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • Hi, could you suggest some reasonable similarity measure? I've seen many people suggesting Gower, but I wonder if it's a good choice. Let's suppose we have one feature and three rows with values 1, 4, 5. Gower will consider as equally dissimilar all the rows, but I think that rows with 4 and 5 should be less dissimilar than 1 and 4 or 1 and 5. Am I right? Can you comment on this? Thanks a lot – crash May 23 '19 at 14:51
  • 1
    You seem to confuse Gower with Hamming. Don't rely on existing distances. Define your own that does your data. – Has QUIT--Anony-Mousse May 23 '19 at 19:31