0

I have two dummy variables and calculated the correlation between them using Stata.

X = Gender ($1=$ female/ $0=$ male) Y = Quiz solved ($1=$ yes/$0=$no)

I was told by my professor to use the pwcorr command. I got a significant correlation of $-0.08383$. Is it correct to conclude that the women in the dataset were less likely got the quiz right?

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
xxgaryxx
  • 21
  • 2
  • 2
    The correlation may be significant (at what level?) but it looks small to me. To get a better idea of what is going on, calculate the proportions of males and females who get the quiz right. In Stata `prtest Y, by(X)` gets you more interesting and useful summaries and does a test too. (Hint: `X` and `Y` are lousy variable names. If your professor is insisting on them, shame. In fact, the advice to use correlation here is a little puzzling.) – Nick Cox Feb 20 '21 at 17:53
  • Could you comment on why is it puzzling to use correlations? Because the data is binary maybe? – Thomas Feb 20 '21 at 19:47
  • 1
    In my opinion, -0.08 is very small, essentially 0. Of course, it depends on your sample size. If the size is small or moderate, it's non-zero value is likely a fluctuation. If the sample size is big, the correlation can be significant, in the sense that we are sure that it exists, but nonetheless is still very very small. In other words, the answer to the question "what's the probability that, choosing a random man and a random woman, the man performs better?" would very very very close to 0.5, i.e. the correlation is so weak that its predictive power is insignificant. – rasmodius Feb 20 '21 at 22:14
  • Thank you very much for your answers. Those are really helpful. According to `pwcorr` the correlation is significant on the 5% level. Regardless of the significance, I am not quite sure about the interpretation: Assuming the correlation would be significant (and stronger). Would the interpretation of the regression coefficient then be correct, that women solved the quiz correctly less often than men? – xxgaryxx Feb 21 '21 at 08:04
  • Yes, that's how a regression can and should be interpreted. In fact the regression coefficient **is** the difference in means, as follows from the algebra. Note: the regression coefficient is not the same as the correlation coefficient. – Nick Cox Feb 22 '21 at 09:42
  • See https://stats.stackexchange.com/questions/103801/is-it-meaningful-to-calculate-pearson-or-spearman-correlation-between-two-boolea for more on correlations between binary variables. I am a participant there, but suggest that there is less disagreement in the thread than may appear. My own stance is that the correlation is meaningful, but not often especially useful. In this question the correlation is an indirect way to look at a relationship better thought of as a comparison of two means. – Nick Cox Feb 22 '21 at 09:47

0 Answers0