1

My understanding is Cramer's V is a measure of the correlation between two discrete variables can be used as an effect size measure for a chi-squared test for observations in a two-way contingency table.

Following the answer given to this question, when trying to carry out such a test for a three-way contingency table, I used a log linear model. Is there a measure of the correlation between three discrete variables to be used alongside log linear models similar to Cramer's V for a chi-squared test?

Joshua Rosenberg
  • 754
  • 10
  • 26
  • What exactly would you want it to mean? – gung - Reinstate Monica Dec 17 '16 at 23:25
  • How strong the correlation between three discrete / categorical variables is. Though I'm having trouble thinking through what that means, precisely. – Joshua Rosenberg Dec 18 '16 at 15:08
  • Do you want a measure of global association? Do you want to know the strength of association between X & Y after controlling for Z? Would you want the same output value if X & Y are strongly related & Z is unrelated, as if Y & Z are strongly related & X is unrelated? – gung - Reinstate Monica Dec 18 '16 at 15:18

1 Answers1

1

Don't know about a measure of association between 3 disrete variables, but you could use R^2 or something like that from a regression model. There is an alternative measure for pairs of binary variables $(x,y)$ called Tanimoto distance (Jaccard distance for discrete vars), which is

\begin{equation} d(x,y)=\frac{n(x \cap y)}{n(x) + n(y) - n(x \cap y)}, \end{equation}

where $n(x \cap y)$ is the number of records with ones in both vectors, and $n(x)$ and $n(y)$ are the total number of ones in each vector. This is a similarity coefficient with range $0 \leq d(x,y) \leq 1$. While the above is not for hypothesis testing, you should probably be aware of it.

Log-linear regression has not been developed in all computer packages as much as one would assume. Statistica (STATSOFT, DELL) has a quite strong package, and SAS has CATMOD, etc. There is a family of categorical regression models known as Grizzle-Starmer-Koch (GSK) which is quite nice for count data which mostly came out of UNC-Chapel Hill, and CATMOD can tackle some of these. I never liked log-linear, and always ran GSK, or "linear categorical regression" -- since you can perform logistic, log-linear, survival analysis, any multiway contingency table analysis problem using GSK. GSK is like a SWAK (Swiss Army knife).