I am working with bioinformatics and I have data that looks like the following:
H3K18Ac H3K27me3 H3K36me3 H3K4me1 H3K4me2 H3K4me3 H3K9Ac H4K12Ac PolII
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0
|
|
308792 0 1 0 0 1 0 0 0 0
So I have about 300,000 observations of 9 variables. They are binarized (ie the only value the variable could have is 0/1). I was told by my professor to run PCA on this. The goal is to determine the independence of the variables. After doing some research, it turns out that PCA is only for continuous variables and what I am looking for is MCA/MFA. This is because (and I am guessing here) that my data is categorical rather than continuous. Is this correct?
My question has two parts. Basically, I am trying to get an intuition on what PCA/MCA/MFA mean and their statistical interpretation.
EDIT: Based on the answer below, I ran PCA on my data. This is what I got back as a summary, can someone care to explain what it all means:
Call:
PCA(b)
Eigenvalues
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7 Dim.8 Dim.9
Variance 3.604 1.115 1.045 0.938 0.713 0.539 0.456 0.300 0.290
% of var. 40.050 12.390 11.612 10.426 7.925 5.986 5.061 3.332 3.217
Cumulative % of var. 40.050 52.440 64.052 74.478 82.403 88.390 93.451 96.783 100.000
Individuals (the 10 first)
Dist Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
1 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
2 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
3 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
4 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
5 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
6 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
7 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
8 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
9 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
10 | 0.699 | -0.542 0.000 0.602 | -0.137 0.000 0.038 | -0.101 0.000 0.021 |
Variables
Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr cos2
H3K18Ac | 0.500 6.943 0.250 | 0.651 38.002 0.424 | 0.271 7.036 0.074 |
H3K27me3 | -0.084 0.195 0.007 | -0.372 12.421 0.139 | 0.584 32.635 0.341 |
H3K36me3 | 0.115 0.365 0.013 | 0.126 1.421 0.016 | -0.763 55.679 0.582 |
H3K4me1 | 0.555 8.543 0.308 | 0.593 31.535 0.352 | 0.175 2.933 0.031 |
H3K4me2 | 0.835 19.327 0.697 | -0.167 2.512 0.028 | 0.041 0.157 0.002 |
H3K4me3 | 0.796 17.592 0.634 | -0.253 5.760 0.064 | -0.046 0.207 0.002 |
H3K9Ac | 0.845 19.828 0.715 | -0.190 3.249 0.036 | 0.001 0.000 0.000 |
H4K12Ac | 0.800 17.736 0.639 | -0.085 0.643 0.007 | -0.035 0.116 0.001 |
PolII | 0.584 9.472 0.341 | -0.223 4.457 0.050 | -0.114 1.236 0.013 |
> res$eig
eigenvalue percentage of variance cumulative percentage of variance
comp 1 3.6044591 40.049546 40.04955
comp 2 1.1151142 12.390158 52.43970
comp 3 1.0451003 11.612225 64.05193
comp 4 0.9383536 10.426151 74.47808
comp 5 0.7132522 7.925025 82.40311
comp 6 0.5387807 5.986452 88.38956
comp 7 0.4555308 5.061453 93.45101
comp 8 0.2998385 3.331539 96.78255
comp 9 0.2895705 3.217450 100.00000
> res$var
$coord
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
H3K18Ac 0.50024249 0.65096914 0.271178606 0.082130491 0.179898889
H3K27me3 -0.08393274 -0.37216436 0.584012215 0.716196755 -0.004918553
H3K36me3 0.11467656 0.12589380 -0.762825333 0.613452477 -0.066302036
H3K4me1 0.55491452 0.59300598 0.175082970 0.171327973 -0.090682110
H3K4me2 0.83463853 -0.16735597 0.040521897 -0.068620027 -0.192035432
H3K4me3 0.79629230 -0.25344621 -0.046489758 -0.054375461 -0.220167476
H3K9Ac 0.84540049 -0.19034355 0.001356118 -0.070513987 -0.087435840
H4K12Ac 0.79955158 -0.08468939 -0.034841216 0.003741303 -0.079640892
PolII 0.58429734 -0.22292558 -0.113651084 0.018482152 0.754258704
$cor
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
H3K18Ac 0.50024249 0.65096914 0.271178606 0.082130491 0.179898889
H3K27me3 -0.08393274 -0.37216436 0.584012215 0.716196755 -0.004918553
H3K36me3 0.11467656 0.12589380 -0.762825333 0.613452477 -0.066302036
H3K4me1 0.55491452 0.59300598 0.175082970 0.171327973 -0.090682110
H3K4me2 0.83463853 -0.16735597 0.040521897 -0.068620027 -0.192035432
H3K4me3 0.79629230 -0.25344621 -0.046489758 -0.054375461 -0.220167476
H3K9Ac 0.84540049 -0.19034355 0.001356118 -0.070513987 -0.087435840
H4K12Ac 0.79955158 -0.08468939 -0.034841216 0.003741303 -0.079640892
PolII 0.58429734 -0.22292558 -0.113651084 0.018482152 0.754258704
$cos2
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
H3K18Ac 0.250242546 0.423760821 7.353784e-02 6.745418e-03 3.236361e-02
H3K27me3 0.007044705 0.138506309 3.410703e-01 5.129378e-01 2.419216e-05
H3K36me3 0.013150713 0.015849249 5.819025e-01 3.763239e-01 4.395960e-03
H3K4me1 0.307930129 0.351656088 3.065405e-02 2.935327e-02 8.223245e-03
H3K4me2 0.696621476 0.028008020 1.642024e-03 4.708708e-03 3.687761e-02
H3K4me3 0.634081430 0.064234980 2.161298e-03 2.956691e-03 4.847372e-02
H3K9Ac 0.714701983 0.036230668 1.839057e-06 4.972222e-03 7.645026e-03
H4K12Ac 0.639282736 0.007172292 1.213910e-03 1.399735e-05 6.342672e-03
PolII 0.341403380 0.049695815 1.291657e-02 3.415899e-04 5.689062e-01
$contrib
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
H3K18Ac 6.9425825 38.001561 7.036438e+00 0.718856660 4.53747063
H3K27me3 0.1954442 12.420818 3.263517e+01 54.663590856 0.00339181
H3K36me3 0.3648457 1.421312 5.567911e+01 40.104703429 0.61632615
H3K4me1 8.5430330 31.535431 2.933120e+00 3.128167613 1.15292246
H3K4me2 19.3266578 2.511673 1.571164e-01 0.501805286 5.17034591
H3K4me3 17.5915835 5.760395 2.068029e-01 0.315093446 6.79615372
H3K9Ac 19.8282728 3.249054 1.759695e-04 0.529887903 1.07185451
H4K12Ac 17.7358854 0.643189 1.161525e-01 0.001491692 0.88926070
PolII 9.4716952 4.456567 1.235917e+00 0.036403115 79.76227412