Correspondence analysis: how are row principal and supplement coordinates calculated?

Question

How are row principal and supplement coordinates calculated in correspondence analysis (CA)? Specifically, I am looking for a simple example as how to derive them using linear combinations of the row profiles, masses, and/or contribution statistics. Everything that I have read has been in matrix algebra, which I do not understand. The matrix notation can be found at the bottom of the second page here.

For example, in principal components analysis, its easy to see that the first principal component of a set of features $X_1, X_2,...,X_p$ is the normalized linear combination of the features $Z_1=a_{11}X_1+a_{21}X2+...+a_{p1}X_p$. Normalization means that the sum of the variance of the $a_{p1}$ is equal to $1$.

I believe these $a_{p1}$ may be analogous to relative or absolute contributions, and the $X_p$ are the centered (demeaned) row profiles weighted by their masses in CA. But I am not sure.

I ran correspondence analysis on the matrix below using the ca package in R. Rows 1-9 are supplementary points. Rows 10-12 are the contingency table.

    Image1  Image2  Image3  Image4  Image5
1_B1    1   0   1   1   0
1_B2    1   1   0   0   1
1_B3    1   0   0   1   1
2_B1    1   0   1   1   1
2_B2    1   1   1   1   1
2_B3    0   0   1   1   0
3_B1    1   1   1   0   0
3_B2    0   0   1   1   1
3_B3    0   0   1   0   0
  B1    3   1   3   2   1
  B2    2   2   2   2   3
  B3    1   0   2   2   1

which produced the results below:

Principal inertias (eigenvalues):

 dim    value      %   cum%   scree plot               
 1      0.088763  71.6  71.6  ******************       
 2      0.035277  28.4 100.0  *******                  
        -------- -----                                 
 Total: 0.124040 100.0                                 


Rows:
        name   mass  qlt  inr    k=1 cor  ctr    k=2 cor  ctr  
1  | (*)1_B1 | <NA>  925 <NA> |  608 863 <NA> |  162  62 <NA> |
2  | (*)1_B2 | <NA>  960 <NA> | -952 824 <NA> |  387 136 <NA> |
3  | (*)1_B3 | <NA>  128 <NA> |  -60   6 <NA> | -271 122 <NA> |
4  | (*)2_B1 | <NA>  430 <NA> |  166 195 <NA> | -182 235 <NA> |
5  | (*)2_B2 | <NA>  825 <NA> | -267 787 <NA> |   59  39 <NA> |
6  | (*)2_B3 | <NA>  705 <NA> |  762 533 <NA> | -433 172 <NA> |
7  | (*)3_B1 | <NA>  811 <NA> | -284  87 <NA> |  820 725 <NA> |
8  | (*)3_B2 | <NA>  939 <NA> |  121  28 <NA> | -694 911 <NA> |
9  | (*)3_B3 | <NA>  252 <NA> |  845 250 <NA> |   84   2 <NA> |
10 |    S_B1 |  370 1000  227 |  164 352  112 |  222 648  518 |
11 |    S_B2 |  407 1000  408 | -348 974  555 |  -57  26   37 |
12 |    S_B3 |  222 1000  365 |  365 653  333 | -266 347  445 |

Columns:
    name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
1 | Idl1 |  222 1000  130 |   90 111  20 |  254 889 407 |
2 | Idl2 |  111 1000  350 | -595 906 443 |  192  94 116 |
3 | Idl3 |  259 1000  133 |  252 996 185 |   16   4   2 |
4 | Idl4 |  222 1000  130 |  202 562 102 | -179 438 201 |
5 | Idl5 |  185 1000  256 | -346 696 249 | -228 304 274 |

The rows and columns table of results is given in a standard format, where quantities are either multiplied by 1000 or expressed in permills (thousandths):

the mass (mass) of each point (x1000), the quality of display in the solution subspace of nd dimensions,
the inertia (int) of the point (in permills of the total inertia), and
then for each dimension (K=1 or 2) of the solution the principal coordinate (x1000),
the (relative) contribution COR of the principal axis to the point inertia (x1000) and
the (absolute) contribution CTR of the point to the inertia of the axis (in permills of the principal inertia).
For supplementary points, masses, inertias and absolute contributions (CTR) are not applicable, but the relative contributions (COR) are valid as well as their sum over the set of chosen nd dimensions (QLT).

Is there a simple linear combination of these metrics and the row profiles that gives the row principal coordinates?

Just in case:

Row profile table:
    Image1  Image2  Image3  Image4  Image5  Total
B1  0.30    0.10    0.30    0.20    0.10    1
B2  0.18    0.18    0.18    0.18    0.27    1
B3  0.17    0.00    0.33    0.33    0.17    1
Ave 0.22    0.11    0.26    0.22    0.19    1

`Everything that I have read has been in matrix algebra, which I do not understand` Why not dedicate one short evening to read about matrix algebra? - just two things 1) matrix multiplication, 2) basic info about SVD. Please believe that multivariate analysis is better mnemonically in matrix form; and if you will want to program your own CA you surely will use matrix operations, it's easier. — ttnphns, Mar 09 '16 at 18:21
In [this](http://stats.stackexchange.com/q/141754/3277) I explain with formulas and words how to do CA up to computation of the coordinates. (If you want also contributions, I could add that.) I followed algorithm of SPSS, but I believe that what R offers is something very similar. — ttnphns, Mar 09 '16 at 18:24
`Is there a simple linear combination of these metrics and the row profiles that gives the row principal coordinates?` In PCA, CA and other methods the "linear combination" coefficients come from a matrix algebra "optimization function" which is called SVD (singular value decomposition). It is the core part of such analyses. — ttnphns, Mar 09 '16 at 18:30
@ttnphns thanks for you comments. In your linked answer, I am confused about the indirect way with regard to the "Correspondence Analysis (Chi-square model)", would it be **$X[W_j]VS^{p1-1/2}$**? Is **$S^{p1-1/2}$** a row matrix? I did these matrix operations in R and didn't receive the row principal coordinates that the ca package gives me. — RTrain3K, Mar 10 '16 at 17:12
@ttnphns apparently I need a reputation of 50 to comment on it. — RTrain3K, Mar 10 '16 at 21:58
If you want to compare your findings to my formulas/results, take please the example data table in the Illustrations section of that my answer and perform the chi-square model CA with row principal spreading of inertia. Publish you results (coordinates) here in your question and I will compare it with my results. — ttnphns, Mar 11 '16 at 12:32
$S^{p1-1/2}$ (or $S^{p1-1}$, in my formula) is the diagonal matrix because $S$ is diagonal. Formula $S^{p1-1}$ means simply this: raise all diagonal elements of $S$ to power p1-1. — ttnphns, Mar 11 '16 at 13:00

Correspondence analysis: how are row principal and supplement coordinates calculated?

0 Answers0