2

I have a data set of about 40 subjects, and 10 variables, some measuring performance on a task, and some measuring some characteristic. Each of the performance variables consists of the proportion of correct responses (range 0 to 1.0) to some set of questions. The characteristic variables (such as verbal IQ) are ordinal, and may or may not follow a normal distribution.

The subjects have been bifurcated into two groups by way of cluster analysis.

I'd like to compare the two clusters on each performance and characteristic measure to test if they are significantly different from each other on that measure. However, the clustering process entails a violation of assumptions that need to be met for standard statistical tests. What sort of tests should I use to test if the two clusters differ on the measures? Thank you.

winshade
  • 21
  • 1
  • What do you mean saying `the clustering process entails a violation of assumptions...`? – ttnphns Jun 19 '13 at 19:13
  • Did you use those performance measures to identify the clusters? If so, testing for differences of performance doesn't make much sense. It's like dividing the sample in "tall" and "short" people, and then comparing the average height of those groups. – Nameless Jun 19 '13 at 19:44
  • @Nameless Some variables of those used in clustering may discriminate between the groups produced while other may not. So group comparisons on the variables aid in identifying these and those. – ttnphns Jun 19 '13 at 19:56
  • @ttnphns - an example of clustering violating an assumption of standard parametric tests would be that in the clustered groups, the subjects were not randomly selected from a population. – winshade Jun 19 '13 at 20:28
  • @Nameless - Yes, the performance vars were used for clustering. However, your conclusion may be off for at least these reasons: 1. That the clusters differ numerically on a variable doesn't mean that there is a statistically significant difference along that variable. 2. Multiple variables were used for clustering; it's possible that there is a statistically sig difference on some vars, but not on others. 3. Clustering analyses may produce errors by some criterion – winshade Jun 19 '13 at 20:29
  • @winshade, yes, though I'd better say "not fully representative" or "biased" instead of "not randomly selected". This is because classic clustering methods produce demarcated groups while in reality (population) they are usually fuzzy, overlapping, to some extent. – ttnphns Jun 19 '13 at 21:00
  • "*variables [...] are ordinal, and may or may not follow a normal distribution*" --- since 'ordinal' and 'normal' are mutually contradictory, you can guarantee that ordinal variables are *never* normal. – Glen_b Jun 19 '13 at 23:26
  • @Glen_b - can you elaborate? Isn't IQ an example of a variable that's considered ordinal and follows a Gaussian distribution? – winshade Jun 20 '13 at 21:52
  • To get back to point, does anyone have a suggestion for which analyses to use in this situation? – winshade Jun 20 '13 at 21:53
  • @winshade I don't believe IQ as measured by the usual instruments is considered to be merely ordinal, no, but if it were, *it could not be Gaussian*, since a random variable with a Gaussian distribution is necessarily (by dint of its mathematical form) continuous with equal intervals (i.e. interval-scaled) and ordinal variables *aren't* continuous nor interval-scaled. The fact that some categorical variable has a bump in the middle doesn't of itself make it Gaussian any more than it makes it beta distributed or anything else. – Glen_b Jun 20 '13 at 22:36
  • @Glen_b - I checked a number of sources, and IQ does seem to be considered ordinal (i.e. https://en.wikipedia.org/wiki/Level_of_measurement). Good point about the var having to be continuous and interval-scaled. I guess in practice, I've seen plenty of cases where ordinal vars such as IQ essentially get treated as continuous vars following a Gaussian dist., so out of habit, learned to gloss over the details of what's actually going on. Here's a discussion on the matter: http://stats.stackexchange.com/questions/539/does-it-ever-make-sense-to-treat-categorical-data-as-continuous – winshade Jun 20 '13 at 22:49
  • @winshade If IQ is only ordinal, no mention of mean IQ can be made, let alone standard deviation. Since those are routinely discussed by people who measure and use IQs *including those who design the instruments to measure them*, either people actually think IQ is interval, or they're engaged in nonsense. In particular, the moment people add together two or more ordinal items to produce a scale, at that instant, *the items themselves have been treated as interval*, and as a consequence, the resulting scale is. – Glen_b Jun 21 '13 at 00:23

0 Answers0