0

I have data that looks like this:

enter image description here

Is there a standard test that I can use to evaluate the correlation between the rank and Boolean attribute?

OregonTrail
  • 101
  • 1
  • 2
  • 1
    Are those really ranks? If so, why would ranks appear to be scaled like that? Are your data lognormal? – gung - Reinstate Monica Sep 12 '15 at 01:20
  • Data is not lognormal, this is a sampled subset of the whole dataset for which I have evaluated the attribute. – OregonTrail Sep 12 '15 at 01:37
  • Just by looking at this plot, I can say that they are almost not correlated at all. So it might be an option to quit/change what you are about to do. Don't you agree? – jeff Sep 12 '15 at 02:22
  • @halilpazarlama you've assumed that my aim is to show that they're correlated – OregonTrail Sep 12 '15 at 16:44
  • Infact, the reverse of that. If your purpose was a hypothesis testing whether they are correlated or not, then you would probably have a very confident result that they are not. But I assumed you were going to predict one from the other, if that's the case, then this data looks noisy. That's why I proposed to change your independent variable(s) if possible. Anyway, as for an answer to your question, a standard test could be Pearson correlation coefficient, treating your Boolean variable as a numeric variable (0-1). But personally I would just calculate and compare the means of the two sets. – jeff Sep 12 '15 at 21:23
  • Why not performing a Kolmogorov-Smirnoff test on the two samples, induced by the boolean value ? If they follow the same distribution, then the boolean feature is irrelevant. – RUser4512 Sep 13 '15 at 17:25
  • Related (maybe a dup): https://stats.stackexchange.com/questions/102778/correlations-between-continuous-and-categorical-nominal-variables/102800#102800 – kjetil b halvorsen Dec 21 '18 at 09:27

2 Answers2

0

I'm 10 months late but... on the off chance you/someone else finds this useful... What you're looking for is Point Biserial Correlation.

0

First off, we know the groups aren't independent. In fact, once you know the ranks in the second group, you know the ranks in the first (and vice versa)--i.e. each rank is either in one group or the other but not both.

Given this, you might want to perform a permutation test to see whether the average rank (or some other statistic) of the first group is extreme compared to the sampling distribution of the average rank under the assumption that the ranks are randomly assigned to each group.

Richard Redding
  • 639
  • 5
  • 11
  • The question is not clear. Most of the questions in comments were not answered by the OP. The OPs graph is confusing. I agree with gung that the data does not look like ranks judging by the scale of the y-axis. It is labeled as ranks but the OP also calls it data. I don't think an answer here is helpful. – Michael R. Chernick Jan 07 '17 at 18:36