2

Background

I have data from surveys (on political views from CSES) with answers from respondents in ranking-scales, either 0:10 (0, 1, 2, ..., 10) or 0:3 (0, 1, 2, 3). I want to analyze this data using hierarchical clustering analysis.

To do this I need to scale these features so that they contribute equally to the distance measure. I do not want to weight any feature more relative to the others at the moment, i.e. they should count equally and so need to be scaled.

For similarity measure, I am planning to use Euclidean distance. I am not sure it is appropriate when using ordinal features as in this case. I know that it is also possible to measure similarity with correlation-based measures, but that does not seem appropriate in this case either.

Questions

  1. Is standardization (scaling to mean of 0 and standard deviation of 1) appropriate for scaling the features in this scale, or should I use some other method to scale it?
  2. Are there any problems with using Euclidean distance as a similarity measure that I should be aware of in this case?

Additional info

Some examples of the survey data (the exact formulation used for the questions were different, I do not have access to that at the moment):

  • On a scale 0 to 10 do you dislike or like party X?
  • On a scale 0 to 10 do you dislike or like party Y?
  • On a scale 0 to 10 where would be on the left-right
  • On a scale 0 to 3 do you feel close to a particular party.

Thanks! Have a nice day.

  • Nobody forces you to treat your scale as ordinal rather than interval or ratio. I don't see ranking in your example questions. it is rating. So? – ttnphns Mar 07 '19 at 14:03
  • Oh I used the wrong word then. But survey respondents are asked to pick 0, 1,2 ... 10 in the questions. – Filip Sjöstrand Mar 07 '19 at 15:17
  • The way I see it the scales are ordknal though? I don't know the difference between 0 and 1 on the scale. I don't know if the distance from 0 to 1 is the same as the distance from 3 to 4 for the same respondent. – Filip Sjöstrand Mar 07 '19 at 15:20
  • Whether we know something or don't know is a thing we decide ourselves. 1-10 or such fine grained rating scales are often considered interval scale. If you insist the data to remain ordinal you might use Gower similarity. But mind, ranking of ordinal data (done there) is again inventing knowledge from nothing, assuming uniformity of ranks in this instance. – ttnphns Mar 07 '19 at 16:38
  • https://stats.stackexchange.com/a/15313/3277 Gower – ttnphns Mar 07 '19 at 16:40
  • Ok thanks for the advice! I will probably treat it as interval then. But I will have a look at Gower's similarity. – Filip Sjöstrand Mar 07 '19 at 18:01
  • @ttnphns if I am not mistaken, psychologists have warned to *not* treat them as interval scales, because people don't pick all values. For example, some cultures consider 4 others 7 to be bad luck, and would never choose this. – Has QUIT--Anony-Mousse Apr 04 '19 at 05:48
  • C.f. https://stats.stackexchange.com/questions/10/under-what-conditions-should-likert-scales-be-used-as-ordinal-or-interval-data – Has QUIT--Anony-Mousse Apr 04 '19 at 05:49
  • @Anony, I agree with your observation, and it is called "focal effect", positive (the figure is liked in the culture) or negative (the figure is disliked). However, this effect is not a _direct_ objection againts intervality or continuality of a scale. Because it pertains to the issue of benchmarks themselves and not the issue of psychometric _distances_ between benchmarks. – ttnphns Apr 04 '19 at 11:34

0 Answers0