0

I would like to implement closed form of PPCA (Bishop, Tipping, 1999, Appendix A). In this paper they calculate $W$ in formula (15):

$W=U_q(K_q-\sigma^2I)^{1/2}R$

where $K_q$ is a matrix from eigenvalues $\lambda_i$ of covariance matrix as defined in formula (5).

On one hand I read here, that it is possible to calculate eigenvalues with

$\lambda_i = \frac{s_i^2}{n-1}$

Is it true, that principal values depend on number of samples $n$?

On other hand I found python implementation of closed PPCA on github, function __fit_ml, where it is written

    mu = np.mean(self.y, 1)[:, np.newaxis]
    [u, s, v] = np.linalg.svd(self.y - mu)
    ...
    else:
        ss = s[:self.q]
    ss = np.sqrt(np.maximum(0, ss**2 - self.prior_sigma))
    w = u[:, :self.q].dot(np.diag(ss))

where author is apparently calculating with

$\lambda_i = s_i^2$

which is completely different.

UPDATE

Here $s_i$ is diagonal element of singular values matrix.

UPDATE 2

I have tried numeric example

X = np.float64(np.array([
    [-1, -1],
    [-2, -1],
    [-3, -2],
    [1, 1],
    [2, 1],
    [3, 2]]))

n_samples = np.shape(X)[0]

X -= np.mean(X, axis=0)

pca = PCA(n_components=2)
pca.fit(X)

cov_matrix = np.dot(X.T, X) / n_samples

singularvalues = pca.singular_values_

for i in range(len(pca.components_)):
    eigenvector = pca.components_[i]
    eigenvalue = np.dot(eigenvector.T, np.dot(cov_matrix, eigenvector))
    singularvalue = singularvalues[i]
    eigenvalue2 = singularvalue**2 / n_samples
    print("%f\t%f;\t%f" % (eigenvalue, singularvalue, eigenvalue2))

output

6.616286    6.300612;   6.616286
0.050381    0.549804;   0.050381

and found that answer depends on covariance matrix normalization.

UPDATE 3

In the other hand it is written in Python's SVD documentation:

The rows of v are the eigenvectors of a.H a. The columns of u are the eigenvectors of a a.H. For row i in v and column i in u, the corresponding eigenvalue is s[i]**2.

UPDATE 4

Ah, sure, a.H a is "scatter matrix", i.e. covariance matrix w/o normalization.

Dims
  • 402
  • 1
  • 3
  • 11
  • In your reference at https://stats.stackexchange.com/questions/134282 you have misunderstood what $S$ is: it is not a covariance matrix. It is the singular part of the model matrix. – whuber Sep 05 '17 at 20:44
  • I was using notation from the paper. I see that `S` means different things in these 2 sources. – Dims Sep 05 '17 at 21:30
  • 1
    Covariance matrix has $1/n$ or $1/(n-1)$ factor right in the definition, see formula 5 in the linked paper. SVD does not. So clearly the formula has to be $\lambda = s^2/n$ or $s^2/(n-1)$ depending on whether one is using Bessel's correction. I don't know what is going on in the `__fit_ml` code, but $\lambda=s^2$ cannot possibly be correct. – amoeba Sep 06 '17 at 07:33
  • @amoeba what if $S$ was defined without $1/N$? – Dims Sep 06 '17 at 08:25
  • I don't understand the question. Covariance already has a definition. See https://en.wikipedia.org/wiki/Covariance#Calculating_the_sample_covariance. Do you mean what if $S$ were NOT a covariance matrix, but a scatter matrix (https://en.wikipedia.org/wiki/Scatter_matrix)? Well, eigenvalues of the scatter matrix are of course $s^2$. – amoeba Sep 06 '17 at 08:26
  • @amoeba I didn't say "covariance" in the question, I said $S$. – Dims Sep 06 '17 at 08:28
  • 1
    OK, eigenvalues of the scatter matrix are of course $s^2$. (I added this to my previous comment.) – amoeba Sep 06 '17 at 08:30
  • @amoeba sure, but I am trying to find an origins of possible error; what if somebody mixed covariance and scatter matrices (as you said)? – Dims Sep 06 '17 at 08:31
  • I don't know if it's really an error, maybe this github code has 1/n normalization done in some other place. In any case, your main question seems to be resolved by now, isn't it? – amoeba Sep 06 '17 at 08:33
  • @amoeba yes, true – Dims Sep 06 '17 at 08:35

0 Answers0