Correlation of bivariate grouped data?

Question

Which test should I use, if I want to test the correlation between 2 bivariate grouped variables?

The case is: I have asked several hotel owners about their feelings about the occupational rate of their hotel and the propagation of the region, where the hotel is. Both of the questions had ordinal scales, spreading from "low" to "high". Because of few participants of the survey I had to categorize the answers to both of the questions to two categories: "low" and "high".

My hypothesis is, that there is a correlation between the occupational rate and the propagation of the region, so for example, the owner, who feels like his hotel has a high occupational rate, also feels like the propagation of the region is high.

My question is, which test could I use to test this hypothesis.

I would appreciate any kind of help! Thank you!

I can't follow your question. Can you provide more detail about what your situation and goals are? You may want to read our [FAQ](http://stats.stackexchange.com/faq), & this blog post: [how-to-ask-a-statistics-question](http://www.statisticalanalysisconsulting.com/how-to-ask-a-statistics-question/), to help w/ asking your Q. I would suggest, however, that you *do not* categorize your variables if they are continuous. I wrote about that issue here: [How to choose between ANOVA & ANCOVA in a designed experiment](http://stats.stackexchange.com/questions/24077//24080#24080). — gung - Reinstate Monica, Oct 16 '12 at 13:03
Thanks, but it's not that case. My two variables are: 1. the occupation of hotel - the answer can be after categorization "low" or "hight" 2. the propagation of the region - again "low" or "hight". I have a hypothesis, that if the propagation of the region is hight, the occupation of the hotels of the region is hight too. — Marlene, Oct 16 '12 at 13:08
I really can't follow what you're talking about here. What is "the occupation of hotel", is that the occupancy rate (ie how many rooms are booked)? What are "low" & "height"? Are they related to how many floors the hotel has (& thus a rough indicator of the number of rooms), or a dichotomization of how many rooms are currently booked? etc. — gung - Reinstate Monica, Oct 16 '12 at 13:13
Its how the owners feel about the occupancy rate of their hotel. — Marlene, Oct 16 '12 at 13:15
That helps a little, thanks. Can you edit your Q to give an explanation of the background, situation, questions, goals, etc.? The blog post linked above will help w/ what I mean. — gung - Reinstate Monica, Oct 16 '12 at 13:18
As @gung says, you shouldn't categorize continuous variables. — Peter Flom, Oct 16 '12 at 13:20
Thanks. I'm assuming that by "hight" you mean *high*, & that by "propagation" you mean *population*. — gung - Reinstate Monica, Oct 16 '12 at 16:13
Yes, it is high, sorry, but propagation, not population, you know, if the region is propagated, advertised, or not. The chí square could work, yes, but the problem is, that I have a very tiny sample, and the the condition of chí square, about at least 5 cases in each category, is not fulfilled. Any idea of correlation test? — Marlene, Oct 17 '12 at 07:00

score 1 · Answer 1 · answered Oct 16 '12 at 16:18

I'm still not certain that you need to group your ordinal responses into two categories, but once you have done so you simply have a 2x2 contingency table with the counts for the number of observations that fall into each of the four possible combinations. If you want to see if these two variables are associated, you can use the $\chi^2$ test for independence.

score 1 · Answer 2 · answered Jan 15 '13 at 08:47

There are a number of ways to assess the correlation between two binomial variables. The most common in my experience is the Phi coefficient. Notably, this coefficient for a 2x2 table has the same value as a regular correlation coefficient (Pearson's product moment) and bears a direct relationship with the $\chi^2$ test mentioned by gung.

With a 2x2 table of counts, you can do a Fisher's exact test instead of a $\chi^2$ without the requirement that you have all cells with N >= 5. I haven't vetted it, but a quick Google search shows this calculator available online.

All of this being said... given that you have a cell in your design where N < 5, I'd recommend caution in interpreting your correlation coefficient.

Correlation of bivariate grouped data?

2 Answers2