1

So I'm currently taking a Machine Learning course and have correctly submitted my implementation of PCA. I used SVD. Here it is Octave.

function [U, S] = pca(X)
%PCA Run principal component analysis on the dataset X
%   [U, S, X] = pca(X) computes eigenvectors of the covariance matrix of X
%   Returns the eigenvectors U, the eigenvalues (on diagonal) in S
%

% Useful values
[m, n] = size(X);

% You need to return the following variables correctly.
U = zeros(n);
S = zeros(n);

% ====================== YOUR CODE HERE ======================
% Instructions: You should first compute the covariance matrix. Then, you
%               should use the "svd" function to compute the eigenvectors
%               and eigenvalues of the covariance matrix. 
%
% Note: When computing the covariance matrix, remember to divide by m (the
%       number of examples).
%

covarianceMatrix = (X' * X) ./ m;
[U, S, V] = svd(covarianceMatrix);


% =========================================================================

end

I thought the point of using SVD in PCA implementations was to improve the computational efficiency and therefore not compute the covariance matrix (which can cause loss of precision).

How come I was expected to create the covariance matrix? How can I implement PCA without creating a covariance matrix?

  • 5
    Yeap.. You need to read amoeba's answer [on how to use SVD to perform PCA](http://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca) carefully. You also need to pounder a bit if doing the eigendecomposition of the covariance matrix is really different from its singular value decomposition. (I also suspect that you have centred that $X$ beforehand right? - in fairness the working is a bit lax in that script SVD gives you singular values not eigenvalues...) – usεr11852 Jun 24 '16 at 07:29
  • Yes, the data is centered. I read [Why PCA of data by means of SVD of the data?](http://stats.stackexchange.com/questions/79043/why-pca-of-data-by-means-of-svd-of-the-data) and I believe if I were to use eig instead of svd in my above code they would compute identical results. I also guess after reading that above post, another way of putting my question is how would I implement the author's step 1. The case where a covariance matrix would not be required and instead svd is applied to the data to get the same result? – Andrew Pham Jun 24 '16 at 08:21
  • Please read the link on how to use SVD to perform PCA provided in my previous comment. User amoeba describes how to get PCA without the covariance matrix. To get the PCA you either do the SVD of the original data $X_0$ or the eigen-decomposition of the data's covariance matrix. – usεr11852 Jun 24 '16 at 18:17

0 Answers0