5

My goal is to check if two variables $X$ and $Y$ are conditionally independent given $Z$.

For simplicity, let's assume the joint distribution is multivariate normal. In this case, we can compute partial correlation of X and Y given Z is by regressing $X \sim Z$ (with residuals $r_X$), regressing $Y \sim Z$ (with residuals $r_Y$) and computing the correlation between the residuals $r_X$ and $r_Y$. Then, conditional independence boils down to testing if this correlation is 0.

However, another way that's seemingly intuitive (at least to me), is to use the interpretation of conditional independence to test whether "knowing $Y$ helps predict $X$ any better than knowing $Z$."

That is, I can regress $X \sim Z$ (with residuals $r_X$) and regress $X \sim Y + Z$ (with residuals $r_{Y,Z}$) and test whether $r_X \ne r_{Y,Z}$ (using some appropriate statistical test or bootstrapping the distribution of the residuals).

Now, my questions are:

  • Is the second method even right?
  • If yes, what are the pros/cons of using the second method instead of the first?
  • If no, could you tell me why it's wrong?
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Vimal
  • 1,047
  • 8
  • 16
  • I don't think your first procedure is correct unless you assume $(X,Y)\,|\,Z$ has a bivariate normal distribution. In the second procedure, exactly what "appropriate statistical test" would you use, given that $r_X$ and $r_{Y,Z}$ appear not to be independent? – whuber Aug 19 '15 at 15:38
  • Yes, you are right. I meant to say "let's assume the variables are all jointly normally distributed." Let me fix the question. In the second procedure, I could bootstrap the distributions of $r_X$ and $r_{Y,Z}$ and see if the distributions are different from each other at some confidence level, perhaps using a two-sample KS test. – Vimal Aug 19 '15 at 15:48
  • I don't see how that KS test would apply. Why should $r_X$ and $r_{Y,Z}$ have identical distributions under the null hypothesis? (In fact, they won't.) – whuber Aug 19 '15 at 15:50
  • Yes, you're right. I overlooked that detail. I am looking for some test by which I can say $X \sim Y + Z$ doesn't do any better than $X \sim Z$. So, perhaps I should look at the variance of the residuals, but then, I do not know its distribution. If we set aside this technicality, does the question have some meaning? – Vimal Aug 19 '15 at 20:37
  • I think, another way to phrase the second method, is to ask: "When does (necessary and sufficient condition) independence in expectation ($E[X | Y, Z] = E[X | Z]$) imply independence in probability $X \bot Y | Z$." The reason being, the solution for the regression $X \sim Z$ under the least-squares objective, approximates the function $E[X|Z]$. – Vimal Aug 21 '15 at 14:20
  • I worked this out and deduced that if $X, Y, Z$ are jointly normally distributed, $E[X|Y, Z] = E[X|Z]$ is equivalent to saying $cov(X, Y | Z) = 0$; thus, it does follow that $X \bot Y | Z$. – Vimal Aug 31 '15 at 20:16
  • 3
    If you worked it out, can you please post that as an answer? – kjetil b halvorsen Dec 09 '18 at 16:47
  • 2
    @kjetilbhalvorsen - sorry I just saw this message. You can check Appendix B in this paper for a proof: https://arxiv.org/abs/1903.08132. – Vimal Jun 18 '19 at 17:43
  • Thanks for the link to the paper. Looks like conditional independence proved by making sure independent and dependent variables are orthogonal. – surlac Aug 17 '21 at 17:09

0 Answers0