0

In principal component analysis, the correlation between the i-th principal component ($y_i$) and the random vector of observed variables x (of dimension $p$) is defined as:

$corr$( $y_i$ , x ) = $\lambda_i^{-0.5}$ * $cov$( $y_i$, x) * $\Delta^{-0.5}$ (#)

with $\lambda_i^{-0.5}$ $i$-th eigenvalue of the covariance matrix of the observed values ($\Sigma$) and $\Delta$ a $p*p$ diagonal matrix with the variances of x on its main diagonal:

$\Delta$ = $diag$($\Sigma$)

Why does this happen? Moreover, I only know of two possible computations of a correlation:

1) correlation between two scalar random variables: $corr$($x$,$y$) = $\frac{cov(x,y)}{ var(x)^{0.5}*var(y)^{0.5} }$

2) correlation "within" a random vector (of dimension $p$): $corr$(x) = $\Delta^{-0.5}$ * $cov$(x) * $\Delta^{-0.5}$

How can I extend such definitions to the two random vectors' case (or, in (#), a scalar random variable and a random vector)? Is the latter case perhaps called "cross correlation"?

Thank you in advance.

user208557
  • 33
  • 2
  • 7
  • Which vectors do you mean as "cross-correlated" in you last paragraph? In [this](https://stats.stackexchange.com/a/119758/3277) answer a proper PCA terminology is introduced. Loading is the covariance between a variable (one of the PCA-analyzed) and a unit-scaled component, while their correlation is called "rescaled loading". – ttnphns Jun 11 '18 at 12:08
  • You're right, sorry, I've just edited my question. Hope it's more precise now. – user208557 Jun 12 '18 at 07:41

0 Answers0