Testing correlation between a Boolean and integer variable

Question

I have data that looks like this:

Is there a standard test that I can use to evaluate the correlation between the rank and Boolean attribute?

Are those really ranks? If so, why would ranks appear to be scaled like that? Are your data lognormal? — gung - Reinstate Monica, Sep 12 '15 at 01:20
Data is not lognormal, this is a sampled subset of the whole dataset for which I have evaluated the attribute. — OregonTrail, Sep 12 '15 at 01:37
Just by looking at this plot, I can say that they are almost not correlated at all. So it might be an option to quit/change what you are about to do. Don't you agree? — jeff, Sep 12 '15 at 02:22
@halilpazarlama you've assumed that my aim is to show that they're correlated — OregonTrail, Sep 12 '15 at 16:44
Infact, the reverse of that. If your purpose was a hypothesis testing whether they are correlated or not, then you would probably have a very confident result that they are not. But I assumed you were going to predict one from the other, if that's the case, then this data looks noisy. That's why I proposed to change your independent variable(s) if possible. Anyway, as for an answer to your question, a standard test could be Pearson correlation coefficient, treating your Boolean variable as a numeric variable (0-1). But personally I would just calculate and compare the means of the two sets. — jeff, Sep 12 '15 at 21:23
Why not performing a Kolmogorov-Smirnoff test on the two samples, induced by the boolean value ? If they follow the same distribution, then the boolean feature is irrelevant. — RUser4512, Sep 13 '15 at 17:25
Related (maybe a dup): https://stats.stackexchange.com/questions/102778/correlations-between-continuous-and-categorical-nominal-variables/102800#102800 — kjetil b halvorsen, Dec 21 '18 at 09:27

score 0 · Answer 1 · answered Jul 20 '16 at 19:49

0

I'm 10 months late but... on the off chance you/someone else finds this useful... What you're looking for is Point Biserial Correlation.

answered Jul 20 '16 at 19:49

Brett Mitchell

1

2

This is rather brief by our standards. Can you elaborate? What is the point biserial correlation & why would it resolve the OP's issue? – gung - Reinstate Monica Jul 20 '16 at 19:58

score 0 · Answer 2 · answered Jan 07 '17 at 16:53

0

First off, we know the groups aren't independent. In fact, once you know the ranks in the second group, you know the ranks in the first (and vice versa)--i.e. each rank is either in one group or the other but not both.

Given this, you might want to perform a permutation test to see whether the average rank (or some other statistic) of the first group is extreme compared to the sampling distribution of the average rank under the assumption that the ranks are randomly assigned to each group.

answered Jan 07 '17 at 16:53

Richard Redding

639
5
11

The question is not clear. Most of the questions in comments were not answered by the OP. The OPs graph is confusing. I agree with gung that the data does not look like ranks judging by the scale of the y-axis. It is labeled as ranks but the OP also calls it data. I don't think an answer here is helpful. – Michael R. Chernick Jan 07 '17 at 18:36

Testing correlation between a Boolean and integer variable

2 Answers2