Correspondence between PCA principal components and the original variables

Question

I'm want to apply PCA to the kaggle's Titanic dataset

For now I'm just taking the columns that have numeric values and dropping the NaN values, So I have five variables, actually four if we ignore the depending variable ('Survived').

I have this loaded into a DataFrame df, if I took five components using PCA:

pca_model = PCA(n_components=5)
pca_model.fit(df)
pca_model.explained_variance_ratio_

[  9.30197643e-01   6.93699966e-02   2.24377672e-04   1.49076254e-04
   5.89069784e-05]

I got that 93 percent of the variance comes from the first component. Is it possible how can I get this same values from the original variables? E.G. Age -> 0.3 of the variance Fare -> 0.6

Can I now which percentage of the principal component is given by each of the original variables?

What you may be speaking is called PCA _loadings_. (Please search this site: `PCA loadings`.) Loading is the covariance or correlation between the unit-standardized component and a variable having its variance. Therefore loading squared is the amount of the variance in a variable accounted for by the component. Variance of the component (eigenvalue) is the sum of its squared loadings. — ttnphns, Mar 02 '17 at 21:27
Read e.g. this: http://stats.stackexchange.com/q/143905/3277 — ttnphns, Mar 02 '17 at 21:39

Correspondence between PCA principal components and the original variables

0 Answers0