6

I understand d-separation and how v-structures in graphical models work. What i don't understand is how they relate to real world multivariate data. I don't see how v-structures can be separated from other junctions by looking at the data.

Take the following example enter image description here

A: Studnet IQ, B: Test score, C: Test difficulty.

Student IQ cannot influence test difficulty, and vice-versa. But once the test result is known, learning either one allows us to make an educated guess about the other.

Say i obtain a data-set with student IQ, test-results and some measure of test-difficulty. How does the multivariate distribution in this dataset reflect the v-structure?

For my own understanding , i would like to generate two datasets (for example using R' MASS package and mvrnorm), one that would fit the v-structure model, and another where the arrow from C to B points the other way.

related question: Understanding d-separation theory in causal Bayesian networks

Ivana
  • 552
  • 2
  • 12
  • 2
    **Nope:** "Student IQ cannot influence test difficulty, and vice-versa. But once the test result is known, learning either one allows us to make an educated guess about the other." This is the point of considering colliders: they *block* association (by contrast with common causes, e.g. "simple confounders" which *induce* association between descendants). – Alexis Mar 13 '15 at 17:29
  • 1
    A and C are marginally independent because the association is blocked by B. But once B is known they become dependent, ie B induces the association. (see Judea Pearl's Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, par 3.1.3, p. 93) – Ivana Mar 17 '15 at 11:28
  • Please see my explanation [here](https://stats.stackexchange.com/a/389240/103153) – Lerner Zhang Jan 26 '19 at 06:46

1 Answers1

3

You understand v-structures, but let's recall formally what they mean. What that v-structure (applied to your example) encodes is: A: Student IQ and C: Test difficulty are independent, i.e. $$I(A,C), \ i.e. \ P(A|C) = P(A)$$ but they are dependent given B: Test score. I.e. $$D(A,C|B), \ i.e. \ P(A|C,B) \neq P(A|C)$$

How do they relate to the real world multivariate data?

Student IQ and Test difficulty are independent, that is clear. Now let's suppose you know the test score, i.e. $B$, of a student. Let's also suppose it is very high, e.g. the maximum grade. Now I ask you the following question:

If you knew that the Test difficulty, $C$, was very high, Wouldn't you think that the student should have a higher probability of being very smart (higher Student IQ, $A$)? There you have it, now $P(A|C,B) \neq P(A|C)$, because knowing $B$ and $C$ has changed the probability of the event $A$.

How does the multivariate distribution in this dataset reflect the v-structure?

You can think of a toy dataset that explains this... (For the sake of simplicity I will use binary variables with states $+$ (high), $-$(low), and $=$ (medium), the last one only for $B$)

A C B
+ + =  
+ - + 
- + - 
- - =
- - =
- + -
+ + -
+ + =
+ - +
+ - +

Look at the dataset. Hopefully you see $I(A,C)$. Now, if you know $B="="$, then knowing $C="+"$ doesn't tell you anything about $A$?


Note, response to the question above: You estimate * real probabilities using the dataset, if we assume probabilities are estimated directly from the dataset (Maximum Likelihood, no need to worry about this, just being formal), you would estimate $P(A = "+" | \ B = "=", C = "+") = 1 $, because all students who obtained $B="="$ (medium score) when $C="+"$ (the exam was hard) were $A = "+"$ (smart, i.e. high IQ).

D1X
  • 733
  • 1
  • 5
  • 21