2

Here (page 415) https://www.stat.cmu.edu/~larry/=sml/DAGs.pdf I found this definition:

enter image description here

which confuses me. I am used to see $E[*|Z]$ as a ($Z$-measurable) random variable and as far as I know the partial correlation is a deterministic scalar, as I understand from the Wikipedia article from example:

https://en.wikipedia.org/wiki/Partial_correlation

What is the meaning intended by the author ? Is this definition intended to be the same used by Wikipedia? (the text in the definition would suggest so...)

( in order to simplify things, let's consider X, Y and Z scalars for the moment )

Thomas
  • 623
  • 3
  • 14
  • If $\rho_{X,Y\mid Z=z}$ is meaningful (possibly changing with $z$, i.e. a function of $z$) then $\rho_{X,Y\mid Z}$ is also meaningful (a function of $Z$). – Henry Dec 08 '21 at 23:25
  • Correlation does not have to be a scaler. There are *[correlation matrices](https://en.wikipedia.org/wiki/Correlation#Correlation_matrices)* if $X$ and $Y$ are vectors. But this is unaffected by whether $Z$ is a scalar or not – Henry Dec 08 '21 at 23:28
  • Thanks @Henry . I updated my question. In the definition of Wikipedia for my understanding the partial correlation is a deterministic scalar if X, Y and Z are scalars r.v.. Let's consider this simple case for the moment. But E[X|Z] in this case is a scalar r.v. and not a deterministic scalar for my notation... I cannot reconcile these definitions at the moment... – Thomas Dec 09 '21 at 00:18
  • $E[X\mid Z]$ is presumably a function of $Z$. So if $Z$ is a random variable, then so too is $E[X\mid Z]$ – Henry Dec 09 '21 at 00:42
  • Exactly also up to my understanding E(X|Z) is a Z measurable r.v., therefore a function of Z. But this contradicts the definition of wikipedia (?) – Thomas Dec 09 '21 at 08:53
  • The numerator is simply the formula of covariance between the residuals: residuals from regressing X by Z (expressed as "X|Z") and residuals from regressing Y by Z (expressed as "Y|Z"). – ttnphns Dec 11 '21 at 17:46
  • @ttnphns Thanks, as you say "expressed as X|Z". Do you find that notation standard/statistically correct and compatible with https://en.wikipedia.org/wiki/Conditional_expectation ? – Thomas Dec 11 '21 at 18:01
  • Well, I don't think about how much it is standard. But it is conceivable. Residuals or, more strictly speaking, errors, are a random variable which is left of r.v. X after conditioning it on r.v. Z. "Given" Z (i.e. letting X' to be the predicted X, aka "image of Z within X"), what is left of specifically X in X? It is the errors. Here, the notation expresses not the conditional expectation, but conditional variation. – ttnphns Dec 11 '21 at 18:51
  • @ttnphns your reasonings make sense. So what they do is borrowing the symbol to imply an other meaning. What is the definition of "conditional variation" ? Does it have a definition or is a loosely/unofficially defined term ? – Thomas Dec 11 '21 at 18:59
  • It is transparent. r.v. = expectation + variation (about it). Only add word "conditional" which in our context means (knowing value of another r.v., which correlates with our r.v.). – ttnphns Dec 11 '21 at 19:17
  • I see but than in this case E(XY|Z) would suggest to regress XY over Z and take the expectation tof the residuals? ( sounds strange since in the standard definition we only regress X and Y over Z but not their product ...) – Thomas Dec 11 '21 at 20:19
  • See the series of equivalent formulas of Covariance between X an Y in wikipedia.You'll find the product XY there. In our case, our variables, per notation chosen by the authors of the book, are X|Z and Y|Z, rather than X and Y. – ttnphns Dec 12 '21 at 00:54
  • Yes of course. Now that I am thinking E(X|Z) in this interpretation would be zero, because after a linear regression the sum of the residuals is zero. Further, I would than use the notation E((X|Z)(Y|Z)), if X|Z were used to indicate the residuals after regression. Anyway thanks for your contribution which probably goes very close to what the author had in mind, even if I still have to get it 100%... – Thomas Dec 12 '21 at 04:10
  • 1
    @ttnphns The definition above in the original post is in the context of jointly normal $(X,Y,Z)$ and not in general (as I found in the linked document). In this special case, these are all formal conditional expectations and variances and the 'conditional correlation' between $X$ and $Y$, given $Z$, equals the usual partial correlation. This is what I understood anyway. – StubbornAtom Dec 12 '21 at 10:01

1 Answers1

1

Suppose you have a random vector $\boldsymbol X=(X_1,X_2,\ldots,X_p)$.

Consider the linear regression models

$$X_1=X_{1\cdot 34\ldots p}+ \varepsilon_{1\cdot 34\ldots p}$$

and

$$X_2=X_{2\cdot 34\ldots p}+ \varepsilon_{2\cdot 34\ldots p}$$

Here $X_{i\cdot 34\ldots p}$ is the part of $X_i$ explained by $(X_3,\ldots,X_p)$ and $\varepsilon_{i\cdot 34\ldots p}$ is the unexplained error, $i=1,2$. Unknown parameters in $X_{i\cdot 34\ldots p}$ are found subject to minimization of $E(\varepsilon_{i\cdot 34\ldots p}^2)$.

If $e_{i.34\ldots p}$ are the residuals corresponding to the models above, the (population) partial correlation between $X_1$ and $X_2$, eliminating the linear effect of $X_3,\ldots,X_p$, is defined to be

$$\rho_{12\cdot 34\ldots p}=\operatorname{Corr}(e_{1.34\ldots p},e_{2.34\ldots p}) $$

For some distributions like the multivariate normal, this correlation coincides with the correlation between $X_1$ and $X_2$, conditioned on $X_3,\ldots,X_p$. In fact, your linked document does assume multivariate normality of $\boldsymbol X$. To quote Wikipedia, "The partial correlation coincides with the conditional correlation if the random variables are jointly distributed as the multivariate normal, other elliptical, multivariate hypergeometric, multivariate negative hypergeometric, multinomial or Dirichlet distribution, but not in general otherwise."

Hence in these specific situations only, one can say

$$ \rho_{12\cdot 34\ldots p}=\rho_{(X_1,X_2)\mid X_3,\ldots,X_p}=\operatorname{Corr}((X_1,X_2) \mid X_3,\ldots,X_p) \tag{$\star$} $$

When $\boldsymbol X$ is multivariate normal, this conditional correlation (i.e. the conditional covariance and the conditional variances) does not depend on $X_3,\ldots,X_p$. Note that the conditional distribution of $(X_1,X_2)$ given $X_3,\ldots,X_p$ is bivariate normal. And you can see here that the dispersion matrix of this conditional distribution is free of $X_3,\ldots,X_p$ (hence non-random). Hence in this case, there is no ambiguity in the formula in your post.

The partial correlation is of course a scalar by definition. In fact, it is entirely based on the entries of the correlation matrix (or equivalently, the dispersion matrix) of $\boldsymbol X$.

Specifically, if $R=((\rho_{ij}))$ is the correlation matrix of $\boldsymbol X$, one can show that

$$\rho_{12\cdot 34\ldots p}=-\frac{R_{12}}{\sqrt{R_{11}}\sqrt{R_{22}}}\,,$$

where $R_{ij}$ is the cofactor of $\rho_{ij}$.

For $p=3$ (say), this reduces to

$$\rho_{12\cdot 3}=\frac{\rho_{12}-\rho_{13}\rho_{23}}{\sqrt{1-\rho_{13}^2}\sqrt{1-\rho_{23}^2}}$$

Related: Derivation of the formula for partial correlation coefficient of second order.

StubbornAtom
  • 8,662
  • 1
  • 21
  • 67
  • Thank you (+1). You say: "This is also sometimes called the correlation between X1 and X2, keeping X3,…,Xp 'fixed'". Isn't it better said that it is the correlation between X1 and X2 when subtracting from both X1 and X2 the part that can be predicted from "X3,…,Xp" ? What sense does it have to say that "X3,…,Xp" are "fixed" ? (they vary no?) – Thomas Dec 09 '21 at 18:18
  • I am not following here: "I can see (⋆) as a formal notation only if the entries of the dispersion matrix of the conditional distribution of (X1,X2) given X3,…,Xp are constants." "constants" with respect to what? You mean when the conditional distribution of (X1,X2) does not depend on the values of "X3,…,Xp"? (i.e. when (X1,X2) is independent of "X3,…,Xp"?) – Thomas Dec 09 '21 at 18:21
  • I only mean when the conditional variances and conditional covariance (entries of the conditional dispersion matrix) do not depend on $X_3,\ldots,X_p$. Regarding your first comment, you are right and by 'fixed' I mean we are conditioning on them. – StubbornAtom Dec 09 '21 at 19:30
  • Ok so this would mean that in these cases $E[XY|Z]-E[X|Z]E[Y|Z]=E[(X-E[X|Z])(Y-E[Y|Z])|Z]$ does not depend on Z in conclusion? This is interesting (or should it be trivial?), and is exactly the numerator of the formula of my original post. But at least Var(X|Z) (in the denominator) will depend on Z also in the Gaussian case, or not? – Thomas Dec 09 '21 at 20:14
  • Ah no maybe what does not depend on Z will be all the original formula with denominators included (so the correlation, not the covariance), as you wrote in your post... – Thomas Dec 09 '21 at 20:17
  • @Thomas Neither $E[XY\mid Z]-E[X\mid Z]E[Y\mid Z]=\operatorname{Cov}((X,Y)\mid Z)$ (the "conditional covariance") nor the conditional variances in the denominator depend on $Z$ when $(X,Y,Z)$ is jointly normal. So in the normal case, the formula in your post is correct. I would try to clarify in an edit later. – StubbornAtom Dec 09 '21 at 20:25
  • Perfect thanks! – Thomas Dec 09 '21 at 20:33