10

What is the best statistical test for investigating if there is any correlation between 2 categorical variables?

Both are satisfaction scores:

1st variable is:

Overall satisfaction with the service.

1: Not at all satisfied; 10: Completely satisfied

2nd variable is:

Satisfaction with the availability of information for the service"

1: Not at all satisfied; 10: Completely satisfied.

soshelp
  • 180
  • 1
  • 1
  • 10
  • 3
    The question concerns *ordinal* variables, rather than nominal categorical ones - I think that ought to be made clear in the question. – Silverfish Jan 16 '15 at 19:26
  • Other relevant questions: [How does the Goodman-Kruskal gamma test and the Kendall tau or Spearman rho test compare?](http://stats.stackexchange.com/q/18112/22228) and [Kendall Tau or Spearman's rho?](http://stats.stackexchange.com/q/3943/22228) – Silverfish Jan 16 '15 at 19:28
  • "Ordinal" added by me to the title. (Note that nobody forces you to regard these variables as ordinal and not interval.) – ttnphns Jan 16 '15 at 19:34
  • @ttnphns Thanks - in that case I will tag it also. – Silverfish Jan 16 '15 at 23:50
  • For categorical variables, you apply polychoric correlation. LISREL program and FACTOR software could do the polychoric correlation. – Emma Nov 15 '18 at 01:44

3 Answers3

9

I would go with Spearman rho and/or Kendall Tau for categorical (ordinal) variables.

Related to the Pearson correlation coefficient, the Spearman correlation coefficient (rho) measures the relationship between two variables. Spearman's rho can be understood as a rank-based version of Pearson's correlation coefficient.

Like Spearman's rho, Kendall's tau measures the degree of a monotone relationship between variables. Roughly speaking, Kendall's tau distinguishes itself from Spearman's rho by stronger penalization of non-sequential (in context of the ranked variables) dislocations.

2

Both of these have enough levels that you could just treat them as continuous variables, and use Pearson or Spearman correlation. You can then calculate a significance (p) value based on your correlation and sample size.

If you really want to treat the data as categorical, you want to run a chi-squared test on the 10x10 matrix of overall satisfaction vs. availability satisfaction. You will need a decent amount of data for this (~thousands), since the majority of the cells should contain at least 5 observations for the test to be valid. This would allow for more general types of dependence between the two measures, in which even nearby levels show different relationships (e.g. rating1=9 tends to predict rating2=4, rating1=8 tends to predict rating2=10) which are probably not likely in your data.

Chris
  • 1,111
  • 5
  • 6
0

I went and searched for it, found this from John Ubersax: http://www.john-uebersax.com/stat/tetra.htm

and some papers

https://link.springer.com/article/10.1007/s11135-008-9190-y

https://escholarship.org/content/qt583610fv/qt583610fv.pdf

leon
  • 1
  • 1
    Welcome to CV, thank you for your contribution. Please add the full references of your links in case they die in the future. – Antoine Nov 27 '18 at 12:01