2

I have a data set with a categorical variable $C$ (approx. 8 levels) and a quantitative variable $X$. I suspect that $X$ and $C$ are strongly dependent. How can I verify this hypothesis?

My approach would be to 'bin' $X$ in order to obtain about the same number of categories as for $C$ and then use a test for contingency tables.

Once I have established depedency I would like to find 'the natural order' of the categories of $C$ implied by $X$, or even to assign distances between the levels of $C$, based on the numeric scale of $X$. Can this idea be made rigorous? Does it make sense?

MKR
  • 200
  • 6
  • 1
    A *poor man's* approach to backing into this concern would be to treat *X* as a dependent variable and *C* as independent thus making them amenable to ANOVA. By taking the square root of the resulting R-square value from that simple model their correlation can be derived. – Mike Hunter Jun 08 '18 at 13:50

1 Answers1

1

ALL EDITED:

The poster has made clear in the comments below that they are looking for a test of association that doesn't assume one variable is independent and one is dependent.

This is probably a duplicate of this Cross Validated question. There is some helpful discussion there. There are some good option at this other CV question.

To meet the Poster's criteria, maybe one approach would be to use an extended version of the Cochran-Armitage test that allows for more than two categories. For this test, one variable is ordinal and the other is nominal, arranged in a contingency table, so it would still require reducing the quantitative variable to an ordinal one, perhaps with some binning. But I think this approach is closer than reducing the continuous variable to a categorical one. From what I read, it's a test of association that doesn't assume that one variable is dependent.

An appropriate measure of association for the ordinal/nominal case may be Freeman's theta.

An appropriate measure of association for the interval/nominal case may be eta-squared.

Sal Mangiafico
  • 7,128
  • 2
  • 10
  • 24
  • Thank you for your answer. From my understanding all proposed tests would treat C as independent and X as dependent variable, i.e try to explain X by observations of C. Are there any ‘symmetric’ measures of association between X and C or is this too much to ask for? – MKR Jun 08 '18 at 12:46
  • I updated my answer. – Sal Mangiafico Jun 08 '18 at 13:44
  • Thanks again. I think your answer was informative even before the edit. Maybe you should combine old + new answer into a single one. – MKR Jun 11 '18 at 09:02