21

I am looking to implement a biplot for principal component analysis (PCA) in JavaScript. My question is, how do I determine the coordinates of the arrows from the $U,V,D$ output of the singular vector decomposition (SVD) of the data matrix?

Here is an example biplot produced by R:

biplot(prcomp(iris[,1:4]))

Biplot of the Iris dataset

I tried looking it up in the Wikipedia article on biplot but it's not very useful. Or correct. Not sure which.

amoeba
  • 93,463
  • 28
  • 275
  • 317
ktdrv
  • 420
  • 1
  • 3
  • 7
  • 3
    Biplot is an overlay scatterplot showing both U values and V values. Or UD and V. Or U and VD'. Or UD and VD'. In terms of PCA, UD are called raw principal component scores and VD' are called variable-component loadings. – ttnphns Mar 10 '15 at 08:09
  • 2
    Note also that the scale of the coordinates depend on how you initially normalize the data. In PCA, for example, one normaly divides the data by sqrt(r) or sqrt(r-1) [r is the number of rows]. But in true "biplot" in narrow sense of the word one normally divides the data by sqrt(rc) [c is the number of columns] and then de-normalizes the obtained U and V. – ttnphns Mar 10 '15 at 08:17
  • Why does the data have to be scaled by $\frac{1}{\sqrt{n-1}}$? – ktdrv Mar 10 '15 at 18:39
  • 1
    @ttnphns: Following your comments above, I wrote an answer to this question, aiming to provide something like an overview of PCA biplot normalizations. However, my knowledge of this topic is purely theoretical and I believe that you have much more hand-on experience with biplots than me. So I would be grateful for any comments. – amoeba Mar 13 '15 at 21:57
  • I'm curious about why would you want to _implement_ this yourself. Unless I would have educational goals or specific requirements, I would try to re-use as much existing functionality as possible and not re-invent the wheel. For example, you could use corresponding `R` code for PCA and then use Shiny, Plotly or some variant of `d3.js` and `R` integration (http://blog.ae.be/combining-the-power-of-r-and-d3-js). – Aleksandr Blekh Apr 06 '15 at 01:51
  • 1
    One reason to implement things, @Aleksandr, is to know exactly what is being done. As you can see, it is not that easy to figure out what exactly happens when one runs `biplot()`. Also, why bother with R-JS integration for something that requires just a couple lines of code. – amoeba Apr 06 '15 at 08:52
  • @amoeba: I understand. However, I think that it is more productive to figure out what existing code is doing than to write it from the scratch (that assumes that the code is decent enough, of course). In regard to the R-JS integration, I thought that this effort is a part of a larger system, which would require such integration anyway. Excellent answer, BTW (+1). – Aleksandr Blekh Apr 06 '15 at 09:37

1 Answers1

45

There are many different ways to produce a PCA biplot and so there is no unique answer to your question. Here is a short overview.

We assume that the data matrix $\mathbf X$ has $n$ data points in rows and is centered (i.e. column means are all zero). For now, we do not assume that it was standardized, i.e. we consider PCA on covariance matrix (not on correlation matrix). PCA amounts to a singular value decomposition $$\mathbf X=\mathbf{USV}^\top,$$ you can see my answer here for details: Relationship between SVD and PCA. How to use SVD to perform PCA?

In a PCA biplot, two first principal components are plotted as a scatter plot, i.e. first column of $\mathbf U$ is plotted against its second column. But normalization can be different; e.g. one can use:

  1. Columns of $\mathbf U$: these are principal components scaled to unit sum of squares;
  2. Columns of $\sqrt{n-1}\mathbf U$: these are standardized principal components (unit variance);
  3. Columns of $\mathbf{US}$: these are "raw" principal components (projections on principal directions).

Further, original variables are plotted as arrows; i.e. $(x,y)$ coordinates of an $i$-th arrow endpoint are given by the $i$-th value in the first and second column of $\mathbf V$. But again, one can choose different normalizations, e.g.:

  1. Columns of $\mathbf {VS}$: I don't know what an interpretation here could be;
  2. Columns of $\mathbf {VS}/\sqrt{n-1}$: these are loadings;
  3. Columns of $\mathbf V$: these are principal axes (aka principal directions, aka eigenvectors).

Here is how all of that looks like for Fisher Iris dataset:

Fisher Iris biplots, PCA on covariance

Combining any subplot from above with any subplot from below would make up $9$ possible normalizations. But according to the original definition of a biplot introduced in Gabriel, 1971, The biplot graphic display of matrices with application to principal component analysis (this paper has 2k citations, by the way), matrices used for biplot should, when multiplied together, approximate $\mathbf X$ (that's the whole point). So a "proper biplot" can use e.g. $\mathbf{US}^\alpha \beta$ and $\mathbf{VS}^{(1-\alpha)} / \beta$. Therefore only three of the $9$ are "proper biplots": namely a combination of any subplot from above with the one directly below.

[Whatever combination one uses, it might be necessary to scale arrows by some arbitrary constant factor so that both arrows and data points appear roughly on the same scale.]

Using loadings, i.e. $\mathbf{VS}/\sqrt{n-1}$, for arrows has a large benefit in that they have useful interpretations (see also here about loadings). Length of the loading arrows approximates the standard deviation of original variables (squared length approximates variance), scalar products between any two arrows approximate the covariance between them, and cosines of the angles between arrows approximate correlations between original variables. To make a "proper biplot", one should choose $\mathbf U\sqrt{n-1}$, i.e. standardized PCs, for data points. Gabriel (1971) calls this "PCA biplot" and writes that

This [particular choice] is likely to provide a most useful graphical aid in interpreting multivariate matrices of observations, provided, of course, that these can be adequately approximated at rank two.

Using $\mathbf{US}$ and $\mathbf{V}$ allows a nice interpretation: arrows are projections of the original basis vectors onto the PC plane, see this illustration by @hxd1011.

One can even opt to plot raw PCs $\mathbf {US}$ together with loadings. This is an "improper biplot", but was e.g. done by @vqv on the most elegant biplot I have ever seen: Visualizing a million, PCA edition -- it shows PCA of the wine dataset.

The figure you posted (default outcome of R biplot function) is a "proper biplot" with $\mathbf U$ and $\mathbf{VS}$. The function scales two subplots such that they span the same area. Unfortunately, the biplot function makes a weird choice of scaling all arrows down by a factor of $0.8$ and displaying the text labels where the arrow endpoints should have been. (Also, biplot does not get the scaling correctly and in fact ends up plotting scores with $n/(n-1)$ sum of squares, instead of $1$. See this detailed investigation by @AntoniParellada: Arrows of underlying variables in PCA biplot in R.)

PCA on correlation matrix

If we further assume that the data matrix $\mathbf X$ has been standardized so that column standard deviations are all equal to $1$, then we are performing PCA on the correlation matrix. Here is how the same figure looks like:

Fisher Iris biplots, PCA on correlations

Here the loadings are even more attractive, because (in addition to the above mentioned properties), they give exactly (and not approximately) correlation coefficients between original variables and PCs. Correlations are all smaller than $1$ and loadings arrows have to be inside a "correlation circle" of radius $R=1$, which is sometimes drawn on a biplot as well (I plotted it on the corresponding subplot above). Note that the biplot by @vqv (linked above) was done for a PCA on correlation matrix, and also sports a correlation circle.


Further reading:

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • 1
    +6, this deserves more than 3 upvotes. – gung - Reinstate Monica Apr 06 '15 at 16:09
  • Thanks a lot, @gung! This is the first time somebody awards a bounty to my answer post hoc :) I will take another look at it to see if anything can be improved. – amoeba Apr 06 '15 at 21:07
  • Another "further reading" might be added to this deserving answer. http://stats.stackexchange.com/q/119746/3277 - for a reader to understand what is "loading plot" and that it is an example of "variables in (reduced-rank) subject space". Therefore biplot is, in a sense, "variable space"+"subject space" in one representation. – ttnphns Apr 08 '15 at 11:04
  • Thanks for this detailed explanation - quick additional question: how do these scaling considerations apply to linear discriminant analysis biplots, as in http://stackoverflow.com/questions/17232251/how-can-i-plot-a-biplot-for-lda-in-r, or correspondence analysis biplots? Does this sqrt(n-1) scaling work the same there? – Tom Wenseleers Aug 10 '15 at 00:15
  • I'm not sure, @Tom, I have never worked with DA biplots (and never worked with CA at all). I would need to look at a concrete example and think about it. – amoeba Aug 10 '15 at 09:21
  • 3
    Just noticed that ?ca::plot.ca has a nice overview of different possible normalisations: they distinguish row principal (form biplot=rows in principal coords, cols in standard coords), col principal (covariance biplot=cols in principal coords, rows in standard coords), symmetric biplot (rows and columns scaled to have variances equal to the singular values (square roots of eigenvalues)), rowgab and colgab (rows in principal coords and cols in standard coords multiplied by the mass of the corresponding point or vice versa) and rowgreen and colgreen (as rowgab and colgab but with sqrt(masses)) – Tom Wenseleers Aug 14 '15 at 07:34
  • 2
    These last ones are also called "contribution biplots"; the book by M. Greenacre "Biplots in practise" also gives a nice overview of all this; these ways of scaling apply to all methods based on the SVD (ie CA biplots, PCA biplots, LDA biplots etc); for an example of how it works see the source code ca:::plot.ca and the "map" argument – Tom Wenseleers Aug 14 '15 at 07:38
  • Ha yes and in vegan normalisation is controlled with the "scaling" argument, whereas in biplot.prcomp it is controlled with the "scale" argument... – Tom Wenseleers Aug 14 '15 at 07:46
  • @amoeba, great answer. Two comments. 1) "length of the loading arrows approximates the variance of original variables" is not accurate, it's actually "standard deviation" of the original variables. 2) The ordering of list items is somewhat confusing. It would be much nicer to have both lists (U and V sides) sorted as in the columns of the plot. That is, have the two item lists match by "properness". – VitoshKa Aug 24 '16 at 22:58
  • Looking again at your post, I see that the $\sqrt{n-1}$ is, as you mentioned also on my comment this morning, to attain unit variance, and corresponds to the middle subplots in the first and second rows of the first figure - does R biplot(), then, combine both the unit sum of squares AND unit variance? Also, it would be great if we could click on your figures to make them zoom out. I wouldn't dare edit your post :-) and I keep on blowing (Ctrl +) my entire browser to see the details. – Antoni Parellada May 02 '17 at 12:38
  • 1
    @Antoni You mean you want each figure to be a hyperlink to itself, e.g. first figure should be a hyperlink to https://i.stack.imgur.com/6ddZg.png ? Please feel free to edit, I can always fix whatever I don't like later :) Regarding the $n-1$, there is some confusion: how can unit SS be combined with unit variance? It's either one or another... I am not sure what you mean. – amoeba May 02 '17 at 12:44
  • Yes, that is what I mean - these plots are very useful, and making them pop up would allow easy access to details. – Antoni Parellada May 02 '17 at 12:47
  • @AntoniParellada Please edit. I think I will have to edit this post later anyway to insert the things that we learnt (and will have learnt) during your investigation :) – amoeba May 02 '17 at 13:06
  • 1
    @AntoniParellada I edited, and inserted a couple of links. – amoeba May 02 '17 at 15:05