1

In a discussion, one claimed that because gender has only two categories, he can correlate it with a continuous variable. Is it acceptable to use Pearson correlation between one continuous and another binary variable?

Mohamed Nabil
  • 43
  • 2
  • 5
  • You can, but why not use a test that is actually appropriate, such as z- or t-tests for means? Also, it's questionable if there are truly precisely two genders (or sexes). – jona Jan 15 '16 at 19:21
  • Thanks for your reply. What made me confused was that he said as long as we have only two categories(male and female) we can use Pearson correlation coefficient but if there are more categories we cannot.What I understand is that Pearson correlation coefficient only used with two approximately normal continuous variables. Am I right? – Mohamed Nabil Jan 15 '16 at 19:36
  • 1
    Have a look at this post (and my answer there): http://stats.stackexchange.com/questions/131065/non-transitivity-of-correlation-correlations-between-gender-and-brain-size-and/131069#131069 – kjetil b halvorsen Jan 15 '16 at 19:40
  • See also [Proof of Point-Biserial Correlation being a special case of Pearson Correlation](http://stats.stackexchange.com/q/105542/17230) & [Can binary data be ordinal?](http://stats.stackexchange.com/q/169604/17230). – Scortchi - Reinstate Monica Jan 15 '16 at 20:22

1 Answers1

2

Sex is a nominal variable: There is no origin, no ordering, etc. However, most dichotomous nominal variables can be treated as dichotomous continuous variables. And when you do that, the cases degenerate to the same result.

Consider a simple regression in your case of two sexes. Code males as zero and females as one.

Because the regression is asked to fit two parameters and the "intercept" will be the male mean and the "slope" is going to be the difference between the female mean and the male mean (intercept).

The $F$-test for this regression will degenerate to the $t$-test for the two groups (literally, $F=t^2$). The $F$-test would also test for slope=0 or correlation=0 because all these tests degenerate to the same test under these conditions.

By the way, the coding does not matter in this simple case either. You could code males=-1 and females=+1 and the answers will be the same.

When you get more than two cases, though, where you place the third and additional points matters and it is not longer degenerative.

Does this help?

StatNoodle
  • 659
  • 3
  • 6