My goal is to check if two variables $X$ and $Y$ are conditionally independent given $Z$.
For simplicity, let's assume the joint distribution is multivariate normal. In this case, we can compute partial correlation of X and Y given Z is by regressing $X \sim Z$ (with residuals $r_X$), regressing $Y \sim Z$ (with residuals $r_Y$) and computing the correlation between the residuals $r_X$ and $r_Y$. Then, conditional independence boils down to testing if this correlation is 0.
However, another way that's seemingly intuitive (at least to me), is to use the interpretation of conditional independence to test whether "knowing $Y$ helps predict $X$ any better than knowing $Z$."
That is, I can regress $X \sim Z$ (with residuals $r_X$) and regress $X \sim Y + Z$ (with residuals $r_{Y,Z}$) and test whether $r_X \ne r_{Y,Z}$ (using some appropriate statistical test or bootstrapping the distribution of the residuals).
Now, my questions are:
- Is the second method even right?
- If yes, what are the pros/cons of using the second method instead of the first?
- If no, could you tell me why it's wrong?