I am learning the basics of text mining. For finding the syntagmatic relations in the text like the word "technology" occurs whenever the word "information" occurs i.e co-occurrence of words "Information Technology".
So one measure used for quantifying the relationship is using conditional entropy.
$H(X_1 | X_2)$ i.e conditional entropy of occurrence of word $X_1$ given that word $X_2$ occurred in that document.
$H(X_1 | X_2)$ & $H(X_1 | X_3)$ : It gives the randomness of word $X_1$ if $X_2$ occur and $X_3$ occurs. The words are co-occurring if there is less randomness. So we select the pairs which are having less conditional entropy given a threshold selected.
So what this $H(X_1 | X_2 )$ and $H(X_3 | X_2)$ will capture the information of $X_2$ with word $X_3$ and $X_1$?
Also why $H(X_1 | X_2)$ and $H(X_3 | X_2)$ are not comparable?