I am confused about the following theorem about d-separations from Judea Pearl's causality textbook which reads as follow:
"If sets X and Y are d-separated by Z in a DAG G, then X is independent of Y conditional on Z in every distribution compatible with G. Conversely, if X and Y are not d-separated by Z in a DAG G, then X and Y are dependent conditional on Z in at least one distribution compatible with G."
Question 1: By distribution, is the theorem referring to the joint probability distribution (i.e. LHS of [1]) or the conditional probability distribution (i.e. each factor in RHS of equation [1])?
P($X_1$,..., $X_n$) = $\prod_i P(X_i|parent_G(X_i))\>\>\>\>\>\>\>\>$ [1]
Question 2: I don't quite understand the intuition behind the converse of the statement above. Is it saying that there are more than one probability function P that satisfies the condition in [1] for a given DAG G? If so, can someone please give me an example to help built the intuition. I've spent hours trying to understand the theorem and i'm not getting anywhere. Any help would greatly be appreciated.