I'm having some trouble fully understanding partial correlation and I was wondering if some of you can shred some light on my confusion.
Let's consider the following scenario: It is a known fact that heart disease is related to social and economic status. However, I want to understand if Anger is also a factor. So the obvious next step is to find the correlation between Anger and heart disease while controlling for social and economic status.
There are a couple ways I can do this. The one popular way that I found online was to use partial correlation (ppcor
in R). However, when I looked into how they did partial correlation, it didn't make a lot of sense to me mathematically. The way they do it is: let's say they have 3 variables ($X, Y , Z$) and we want to correlate $X$ and $Y$ while taking into consideration $Z$, they take the residuals from correlating $X$ and $Y$, then $X$ and $Z$, then they correlated the two residuals to get the result.
This doesn't make a lot of sense to me, if residuals are variance that are not explained through correlation, then wouldn't it make more sense to only take the residuals from $X$ and $Z$, then correlate that residual with $Y$, that way we can see if $Y$ can explain the variance that is not explained by $X$ and $Z$, therefore "controlling" for $Z$?