1

I have 6 categorical variables that can have the values -1, 0 and +1. The extremes are assigned to a semantic label. During rating, the rater could select either one of the labels (-1, +1) or neither (0). For instance, one variable consists of the classes dark (-1), < neither > (0) and bright (+1). This scale assumes that the semantic labels are perfectly opposing, thus one dimensional. However, it could also be that the labels are not perfectly opposing which would mean that we are hiding 2 separate variables spanning 2 dimensions in one dimension.

The question is, how can I prove that this is the case? Or how can I prove that the variable is indeed one dimensional?

One idea I had was to use the correlation between one variable $V$ and another variable $R$ as reference. The approach would be as follows: I split my data set in two subsets $A$ and $B$, so that one subset contains variable $V_A$ of $V$ which only includes values [-1,0], the other subset contains variable $V_B$ of $V$ which only includes values [0,+1]. I then use something like Spearman's Rho to evaluate the correlation between $V_A$ and $R_A$ and $V_B$ and $R_B$. If $V$ is one dimensional, I should expect the correlation coefficient to be similar for both sub classes. If not, the coefficients should differ. As a note: Nearly all of my 6 variables are significantly correlated, some even very strong (rho = 0.4).

Is this approach valid? If not, is there a common approach for this kind of issue?

UPDATE: Context

Here is a bit of context for the question: The variables I described are response variables in a machine learning problem. I have a bunch of other variables, mostly continuous, that I use as predictors. I want to evaluate whether or not the variables are onedimensional to find hints towards reasons for the poor performance of my ML classification.

  • Your approach ("if correlations on the subranges of the variable are similar the subranges measure the same trait") is sound only if the other variable (R) was already assumed likewise unidimensional. If you have such R (which could be some 7th variable) - that's nice. – ttnphns Jul 24 '19 at 12:56
  • Among other methods there is a more sophisticated Categoricall PCA (search the site for CatPCA) which can "check" whether each variabble, which is associated with other variables of a set, can bee seen reasonably as scale (smooth unidimensional), ordinal (not smooth so) or nominal (not unidimensional). – ttnphns Jul 24 '19 at 13:03
  • @ttnphns I've added a bit of context. I have a whole number of additional variables that I use as predictors. I suppose I could compare $V$, $V_A$ and $V_B$ against my predictors? I'll have a look at CatPCA, thanks! – ruhig brauner Jul 24 '19 at 14:09
  • "Nearly all of my 6 variables are significantly correlated" Are any of your variables known to be cardinal or ordinal? This observed correlation would then constitute evidence of cardinality/ordinality. Else, I highly suspect the answer is "not possible". The reason being that there is no a priori reason to assume a different prior on the distribution of -1,0,+1 for cardinal, ordinal or categorical data. Therefore, the observed distribution gives you no information as to which of these three types your data actually are. – Him Jul 24 '19 at 14:28
  • @Scott I guess they are ordinal by design. The two labels are intended to span a linear scale with 0 being used as the neutral middle ground. ordinal variables would already assume that the variable is onedimensional... – ruhig brauner Jul 24 '19 at 14:47
  • On just the unidimensionality front, this is probably a "not possible" unless you have other variables that are continuous. For continuous variables, data suggestive of complex surfaces *might* be considered as evidence that the data are actually a projection of a "simpler" surface from a higher-dimensional space. For example, the [concentric circles](https://images.app.goo.gl/Jv6pFmpP2MSM2XTf8) suggests that your 2 variables are "really* 3 totally different variables... r, $\theta$ (polar from rectangular) + one unobserved "which circle" variable. – Him Jul 24 '19 at 14:48
  • @Scott as mentioned in my update, I have a bunch of continuous variables that I use to classify the variable described above. I could also calculate the correlations between my categorical data ($V_A$, $V_B$,..) and my continuous predictors. – ruhig brauner Jul 24 '19 at 14:55
  • I would argue that a high correlation between your ordinal variable and another variable known to be continuous provides some evidence that the ordinal variable is, in fact, ordinal. Note that this need not hold. For example, a categorical variable of "plane=1" vs "car=0" vs "submarine=-1" correlates with the continuous variable "altitude", but is neither cardinal nor ordinal. However, if you hypothesize "variable X is ordinal", then observe a correlation with some other continuous variable, I should think that that observation would support the ordinality hypothesis. – Him Jul 24 '19 at 15:00
  • Note that the correlation above hinges on a very particular arrangement of the values 1,0,-1 to "plane", "car" and "submarine". In fact, only 2 of the 6 possible arrangements should produce a reasonably strong correlation. Thus, if your data were, in fact, categorical, then if you assign labels 1,0,-1 arbitrarily, there is a good chance you'll not see a correlation, even on the odd chance that there is one. – Him Jul 24 '19 at 15:04
  • @Scott, the varaibles used are named semantic differentials in the context. The idea is to create a (onedimensional) continuous scale by using two opposing semantic labels as refrences. In my case, the scale is quantized in three discrete classes and I asign -1 / +1 to the extremes and 0 to the neutral in-between class. – ruhig brauner Jul 24 '19 at 15:20
  • "If V is one dimensional, I should expect the correlation coefficient to be similar for both sub classes. If not, the coefficients should differ." Not only does this not hold, but it might very well hold for categorical variables. – Him Jul 24 '19 at 20:30
  • Counter example: (that one-dimensional does not imply similar rho) # of people in household is cardinal, but the various sub-categories (1-2, 2-3, 3-4) do not have similar correlations with, say, household income. The reasoning is that, from 1-2, there is a great increase in income, because the second person is likely adult. From 2-3 there is not as much increase, because the third person is a minor. – Him Jul 24 '19 at 20:31
  • You might be able to test ordinality with a separate experiment using something that is generally agreed to be ordinal, such as a [likert scale](https://en.wikipedia.org/wiki/Likert_scale). The experiment would be: for each image have three scales for the subject to fill in: 1) 'this image is "dark"' (0 (strongly disagree) ... 5 (strongly agree)) 2) 'this image is "bright"' (0-5) 3) 'this image is neutral' (0-5)... or some such. If there is a strong correlation between the likert responses and the -1,0,1 assignments, then that would be very good evidence of ordinality, I think. – Him Jul 24 '19 at 20:40
  • "The idea is to create a (onedimensional) continuous scale..." For reference, I am using terms in the following way: [continuous](https://en.wikipedia.org/wiki/Continuous_or_discrete_variable), [cardinal](https://en.wikipedia.org/wiki/Natural_number), [ordinal](https://en.wikipedia.org/wiki/Ordinal_data), categorical := [discrete data](https://en.wikipedia.org/wiki/Continuous_or_discrete_variable) that is not ordinal nor cardinal. – Him Jul 24 '19 at 20:53

0 Answers0