Questions tagged [pca]

Principal component analysis (PCA) is a linear dimensionality reduction technique. It reduces a multivariate dataset to a smaller set of constructed variables preserving as much information (as much variance) as possible. These variables, called principal components, are linear combinations of the input variables.

Principal component analysis is a technique to decompose an array of numerical data into a set of orthogonal vectors (uncorrelated linear combinations of the variables) called principal components. The first few principal components often suffice to grasp nearly all the multivariate variability of the data; therefore PCA is one of the data reduction / dimensionality reduction methods.

3190 questions
1229
votes
27 answers

Making sense of principal component analysis, eigenvectors & eigenvalues

In today's pattern recognition class my professor talked about PCA, eigenvectors and eigenvalues. I understood the mathematics of it. If I'm asked to find eigenvalues etc. I'll do it correctly like a machine. But I didn't understand it. I didn't…
claws
  • 12,575
  • 3
  • 15
  • 10
516
votes
3 answers

Relationship between SVD and PCA. How to use SVD to perform PCA?

Principal component analysis (PCA) is usually explained via an eigen-decomposition of the covariance matrix. However, it can also be performed via singular value decomposition (SVD) of the data matrix $\mathbf X$. How does it work? What is the…
amoeba
  • 93,463
  • 28
  • 275
  • 317
252
votes
15 answers

What are the differences between Factor Analysis and Principal Component Analysis?

It seems that a number of the statistical packages that I use wrap these two concepts together. However, I'm wondering if there are different assumptions or data 'formalities' that must be true to use one over the other. A real example would be…
Brandon Bertelsen
  • 6,672
  • 9
  • 35
  • 46
201
votes
6 answers

Can principal component analysis be applied to datasets containing a mix of continuous and categorical variables?

I have a dataset that has both continuous and categorical data. I am analyzing by using PCA and am wondering if it is fine to include the categorical variables as a part of the analysis. My understanding is that PCA can only be applied to continuous…
196
votes
7 answers

PCA on correlation or covariance?

What are the main differences between performing principal component analysis (PCA) on the correlation matrix and on the covariance matrix? Do they give the same results?
Random
  • 2,140
  • 3
  • 13
  • 8
159
votes
5 answers

What's the difference between principal component analysis and multidimensional scaling?

How are PCA and classical MDS different? How about MDS versus nonmetric MDS? Is there a time when you would prefer one over the other? How do the interpretations differ?
Stephen Turner
  • 4,183
  • 8
  • 27
  • 33
156
votes
1 answer

How to reverse PCA and reconstruct original variables from several principal components?

Principal component analysis (PCA) can be used for dimensionality reduction. After such dimensionality reduction is performed, how can one approximately reconstruct the original variables/features from a small number of principal…
amoeba
  • 93,463
  • 28
  • 275
  • 317
136
votes
6 answers

Should one remove highly correlated variables before doing PCA?

I'm reading a paper where author discards several variables due to high correlation to other variables before doing PCA. The total number of variables is around 20. Does this give any benefits? It looks like an overhead to me as PCA should handle…
type2
  • 1,471
  • 3
  • 10
  • 4
118
votes
4 answers

PCA and proportion of variance explained

In general, what is meant by saying that the fraction $x$ of the variance in an analysis like PCA is explained by the first principal component? Can someone explain this intuitively but also give a precise mathematical definition of what "variance…
user9097
  • 2,973
  • 7
  • 18
  • 11
103
votes
2 answers

Why do we need to normalize data before principal component analysis (PCA)?

I'm doing principal component analysis on my dataset and my professor told me that I should normalize the data before doing the analysis. Why? What would happen If I did PCA without normalization? Why do we normalize data in general? Could…
jjepsuomi
  • 5,207
  • 11
  • 34
  • 47
99
votes
5 answers

What is the relation between k-means clustering and PCA?

It is a common practice to apply PCA (principal component analysis) before a clustering algorithm (such as k-means). It is believed that it improves the clustering results in practice (noise reduction). However I am interested in a comparative and…
mic
  • 3,848
  • 3
  • 23
  • 38
95
votes
5 answers

Loadings vs eigenvectors in PCA: when to use one or another?

In principal component analysis (PCA), we get eigenvectors (unit vectors) and eigenvalues. Now, let us define loadings as $$\text{Loadings} = \text{Eigenvectors} \cdot \sqrt{\text{Eigenvalues}}.$$ I know that eigenvectors are just directions and…
user2696565
  • 1,239
  • 1
  • 10
  • 14
87
votes
7 answers

What are principal component scores?

What are principal component scores (PC scores, PCA scores)?
vrish88
  • 1,143
  • 1
  • 9
  • 8
84
votes
4 answers

What're the differences between PCA and autoencoder?

Both PCA and autoencoder can do demension reduction, so what are the difference between them? In what situation I should use one over another?
RockTheStar
  • 11,277
  • 31
  • 63
  • 89
83
votes
4 answers

How to visualize what canonical correlation analysis does (in comparison to what principal component analysis does)?

Canonical correlation analysis (CCA) is a technique related to principal component analysis (PCA). While it is easy to teach PCA or linear regression using a scatter plot (see a few thousand examples on google image search), I have not seen a…
1
2 3
99 100