I would like to implement closed form of PPCA (Bishop, Tipping, 1999, Appendix A). In this paper they calculate $W$ in formula (15)
:
$W=U_q(K_q-\sigma^2I)^{1/2}R$
where $K_q$ is a matrix from eigenvalues $\lambda_i$ of covariance matrix as defined in formula (5)
.
On one hand I read here, that it is possible to calculate eigenvalues with
$\lambda_i = \frac{s_i^2}{n-1}$
Is it true, that principal values depend on number of samples $n$?
On other hand I found python implementation of closed PPCA
on github, function __fit_ml
, where it is written
mu = np.mean(self.y, 1)[:, np.newaxis]
[u, s, v] = np.linalg.svd(self.y - mu)
...
else:
ss = s[:self.q]
ss = np.sqrt(np.maximum(0, ss**2 - self.prior_sigma))
w = u[:, :self.q].dot(np.diag(ss))
where author is apparently calculating with
$\lambda_i = s_i^2$
which is completely different.
UPDATE
Here $s_i$ is diagonal element of singular values matrix.
UPDATE 2
I have tried numeric example
X = np.float64(np.array([
[-1, -1],
[-2, -1],
[-3, -2],
[1, 1],
[2, 1],
[3, 2]]))
n_samples = np.shape(X)[0]
X -= np.mean(X, axis=0)
pca = PCA(n_components=2)
pca.fit(X)
cov_matrix = np.dot(X.T, X) / n_samples
singularvalues = pca.singular_values_
for i in range(len(pca.components_)):
eigenvector = pca.components_[i]
eigenvalue = np.dot(eigenvector.T, np.dot(cov_matrix, eigenvector))
singularvalue = singularvalues[i]
eigenvalue2 = singularvalue**2 / n_samples
print("%f\t%f;\t%f" % (eigenvalue, singularvalue, eigenvalue2))
output
6.616286 6.300612; 6.616286
0.050381 0.549804; 0.050381
and found that answer depends on covariance matrix normalization.
UPDATE 3
In the other hand it is written in Python's SVD documentation:
The rows of v are the eigenvectors of a.H a. The columns of u are the eigenvectors of a a.H. For row i in v and column i in u, the corresponding eigenvalue is s[i]**2.
UPDATE 4
Ah, sure, a.H a is "scatter matrix", i.e. covariance matrix w/o normalization.