In max-variance PCA, why is the variance of the projected data equals to $\sum_{j=1}^M\mathbf{u}^T_j\mathbf{S}\mathbf{u}_j$?

Question

In my machine learning course we have been taught that given a new axis $\mathbf{u}_j$ and a datapoint $\mathbf{x}_n$, the projection is $z_j = \mathbf{u}^T\mathbf{x}_n$. The variance of $z_j$ can be thus proved to be $\mathbf{u}_j^T\mathbf{S}\mathbf{u}_j=\lambda_j$. So far so good.

But then my professor said that "the variance of the projected data" is:

$$ \sum_{j=1}^M\mathbf{u}_j^T\mathbf{S}\mathbf{u}_j=\sum_{j=1}^M\lambda_j $$

I fail to understand why the variance of the projected data is the sum of the variance in each new axis. Shouldn't the variance be a matrix? Like $diag(\lambda_1,...,\lambda_M)$?

In max-variance PCA, why is the variance of the projected data equals to $\sum_{j=1}^M\mathbf{u}^T_j\mathbf{S}\mathbf{u}_j$?

The covariance Matrix of the projected data-vector $(z_1,...z_n)$is indeed $diag(\lambda_1,...\lambda)$ as you suggest. But what he refers to is the variance of $z_1+...z_n$, which is the sum of the individual variances of the $z_i$ as they are uncorrlated. This can be seen as measuring the total variance of your data. — Sebastian, Oct 06 '19 at 08:48

In max-variance PCA, why is the variance of the projected data equals to $\sum_{j=1}^M\mathbf{u}^T_j\mathbf{S}\mathbf{u}_j$?

0 Answers0